Wednesday, June 01, 2016

Google's war on folders

In the beginning, the search model for the Internet was (nested) folders:
Yahoo = "Yet Another Hierarchical Officious Oracle"
Remember, the intent is to find something in a vast space of candidates. The simplest way is to give that thing a label ('the filename') and put it in a folder which groups similar items by some salient property ('the folder'). Folders are nested ad libitum.

Folder-nesting is essentially set-inclusion, sometimes termed an 'isa-hierarchy'; a related idea is that of 'inheritance'.

Given an isa-hierarchy (a tree structure), a language to describe it is most simply effected by labelling the branches in order:
where the language syntax is pretty much isomorphic to the semantic structure.

It was a small shock when Gmail introduced arbitrary labels to replace folders. The semantic space became somewhat more complex, as the set of all subsets of a collection of labels constitutes a lattice.

A simple Google search on keywords is not dissimilar to a search on labels (plus a fancy ordering relation wherein lies the magic).

Then came Google Photos, where all those carefully-named folders on your hard-drive ('My-Pictures/Holiday-in-the-Dordogne-July-2014') vanished, and Google used metadata and AI scene recognition to organise your photos along many dimensions: people, places, things-in-the-picture, etc.

In real life and in ordinary conversation, people do not restrict themselves to labels to refer to things.
"That famous joke by that dead middle class comedian"
doesn't get a useful hit from Google today, although many British people might hazard a guess.
"That famous joke by Bob Monkhouse"
nails it, top of the list - but it's really labels at work here.

As Google augments its bottom-up, neural-net phrase-recognisers with stable and ubiquitous semantic/pragmatic models, the clunky, manual world of the folder hierarchy will finally be put to rest.

Google tried to bring a version of search to the personal hard-drive once before, but it never really took off: too much noise and not enough signal in the results.

But next time around?

Folders: archaic, inflexible, labour-intensive .. and doomed.

No comments:

Post a Comment

Comments are moderated. Keep it polite and no gratuitous links to your business website - we're not a billboard here.