In his talk "Ontology is overrated" (audio, text), Clay Shirky claims that top-down classification like ontology building is the old-style solution for a new kind of problem. In particular, it refers to a world where physical entities are the resources (like books in a library) and therefore shape the solution.
But in the web, he claims, there are no shelves and ontologies are not a modern approach.
I'll try to wrap up that talk and tell you what I think of it. For instance, I wonder if ontologies really need to be a top-down model in the first place. 
For anyone wondering what we are talking about here anyway, you can read here or feel content with my very shortened description: An ontology is a hierarchical structure of concepts, connected by relations like the subconcept-of relation (where for example "car" is a superconcept of "porsche" and "ferrari") or the part-of relation. It can be useful to conduct logical reasoning, for example by agents that find out that those "cowboy trousers" some vendor offers on the web might fit to your search for "blujeans".

Shirky begins with the observation that all top-down classifications induce a bias into their model of the world. For example, things are being classified as equally important to other things but that doesn't resemble the reality of most people (Shirky's example: in the Library Of Congress you'll find "Asia" and "Balkan Peninsula" both as subconcepts of "History").
Shirky says that this problem often arises when you have to categorize physical entities, like books. They have to be in some place and not in another - now, for the virtual world of data this doesn't hold - "there is no shelf", Shirky claims. So why use an old, error-prone method in a new world?
As another example, Shirky compares Yahoo's forced categorization scheme (old solution) and Googles overnight success by only regarding bottom-up made, user-based annotation: links and word occurences (new solution). Yahoo didn't notice that only links suffice to describe a structure, moreover: links describe the structure in a much more useful way than their categorization scheme does. The striking difference here is that Yahoo (and the ontologists they hired) saw a need to categorize in advance, while Google waits for what the user wants in particular, and then tries its best in structuring their data.

He then claims that ontologies are only a good idea in very controllable environments, where you'd find expert users and authoritative judgements. One problem he adresses is a so-called "voodoo categorization", meaning that a thing is classified only by means of the point of view of an authority. SUVs are called 'light trucks' because the government said so. This "vodoo categorization" has of course been better described by the philosopher John Searle as Perlocutionary acts.

According to Shirky, the signal loss brought about by the unificating forces of top-down classification is enormous (an example: people tagging things with "movie" might not want to be put in the same bucket with the people tagging it with "cinema" - you might have to be a native speaker of english to understand that). Moreover, classification means prediction, and the future is unstable (his example is that all books on the shelf "East Germany" now are wrongly classified)

Shirky proposes to see categorization more in terms of market logic: it happens by individual interest but results in high group value. That's what tagging is all about. Shirky even shows some graphs from to show how bottom-up classification data is structured.
Bottom-up categorization (or user-centered, if you prefer that term) then runs into a more probabilistic scheme, contrary to the old binary scheme (in or out, nothing between), which makes a profound difference. You can reflect changes in the world much better - which is also true since you can also regard categorization data on a time scale if it is so changeable.

So there, that's what Shirky's talk is about. What do I have to say to that?

I think Shirky is right, when he says that...
  • ...bottom-up, user-centered data is the way to go. People want their data and they want data that reflects what other people around them think.
  • ...probability models work better than hard-coded, binary decisions, at least in a world that constantly changes.
  • a lot of cases, it makes not much sense to classify before you know what the classification is for

I think Shirky is doesn't hit the point when he says...

  • ...why we need ontologies in the first place: I think there are "old" problems and "new" problems we should be distinctive about. Tagging is good for "old" problems that are actually being solved: somebody stores stuff on the internet, somebody blogs about staff on the internet and some people try to find stuff that interests them - it's always one person doing something, being helped by what other people did before. What poses new challenges is several people working together electronically: It's about collaboration, communication, trading. Such tasks involve even more automation and also reaching agreements on concepts etc. I don't see that tags are enough for that: They have almost no relations and you cannot reason on them in a meaningful logical way.
  • ...people will always have a hard time with standards. In fact, some standards are widely accepted. A lot are not, I know, but there are always some standards, be it a law or a web standard, that many people will agree on, which makes them a lot more effective when they cooperate. Ontologies, for instance, that define a certain standard, say, for insurance policies, will emerge. Most of them will not be widely accepted, but some will.
  • ...we should not mind-read. Application building is all about the human mind. It is always a kind of general model that hopefully many users can agree on. And, Psychology tells us, the human mind does also classify hierarchically. Even Google sometimes makes top-down classification:

The tagging approach models parts of the human mind, important ones, like gossip, but none that really solve the new problems mentioned above
  • ...there is a clear distinction between ontologies and tags. Some of the arguments don't make as sharp a border between them as Shirky claims. For example, "vodoo categorization": perlocutionary acting is reality and happens also with tags. The same holds for future predictions. And also the arguments of probabilistic versus binary classification or the market concept (individual interest, high group value) do not hit the ontology concept at its core. They just deal with the way ontologies have been used until today (or a few years ago. I've seen approaches that try to bring ontologies to all of those areas).
So, in my view, it all boils down to top-down or bottom-up. That is where Shirky sure is right and he hits the point. The thing is, I cannot see why ontologies only can be useful in top-down-contexts... We need to sit down and rethink ontologies in the light of bottom-up approaches.
There are a lot of problems on the way, for example this: how can two parties reach agreement on what they think they are talking about? There will never be an ontology that covers everything and that everyone will agree on. Agreement on ontology will be a question of context. To be formal, this very problem touches the well-known grounding problem that Artificial Intelligence poses.

To conclude, an idea of mine: most of the time it's good to combine things. Can't we use tags to build ontologies? Could we harvest, and flickr etc for overlappings? For example: if a lot of things are tagged with 'OsX' and 'Operating System' and a lot of other things with 'Windows' and 'Operating System' - there seems to be a relation between 'OsX' and 'Windows'. Maybe some processing could conclude that the similarity is likely them both being a subconcept of 'Operating System'...
Well, I bet it's a really complicated problem, so I'll rethink that...
  on31 Mar 2008 - 15:18 fromJan www

I've just reviewed Weinberger's »Everything is Miscellaneous« at the Brasserie. He argues in a very similar fashion. The problems with what I understand about Shirky (and in some respect Weinberger) are these:

They both are all for the modern. Community driven bottom-up miscellany. Why is it that top-down can not be a concept incorporated into bottom-up? They don't like the binary stuff any more (»A book can go either on this OR on that shelf«), yet it always has to be »top-down« OR »bottom-up«. Take reddit for instance. Once a weak someone will post »Subreddit are a castrated tagging system«. I agree. But why can you not have a handful of carefully selected subreddits AND let rthe users tag away? And then maybe have a smart team of programmers do as you proposed: harvest the tags and add them to subreddits according to probability?

The second, massive problem I have with the Shirky-Weinberger approach is their lack of flexibility. They seem to argue top-down is dead, thus everything should be bottom-up now. But bottom-up only works with probability, and probability only works with sheer mass. If you look for »Golden Gate« at Flickr, you will miss some  pictures displaying the bridge because they might be tagged »San Francisco« or »SF«. It will not matter greatly, because you still get thousands of Golden Gate pics. But what if you don't have millions of people willing to tag because you are Whitley Bay public library?

  on05 Nov 2011 - 9:52 fromTHahn www

  True, "google is overrated" it's a mass phenomenon only working in environments with a vastly huge number of "clones", (i.e. identical operating-systems, cars, animals), sharing all the same blueprint. That's why you are "surprised", whenever google "finds" something , that was exactly what you where looking for... - it's not because tagging works, it's because there are so many cloned systems,individuals,animals,.. 

"Bottom-up" only works , because of these myriads of identical copies out there!

