Here is a useful primer into what some people (perhaps not the best advised) are calling Web3.0.
The fundamental principle of semantifying data is that information becomes more easily found and understood by computers. Mix that with AI and you’ve got some very, very powerful, useful tools for information gathering, processing and decision making!
So why is Google – the information lynchpin of the Internet, and thus, of modern society – not THE focus of attention in all this hubris about Web3.0?
This is a company with around five THOUSAND(1) computer scientists devoted to improving their search engine (~35,000 man hours a day). SURELY they’re building some amazing semantic IP that will help cement their dominance.
A big debate in the semantic field at the moment is whether the best approach is ‘top-down’ or ‘bottom-up’
- Bottom-up: when information is created, it is annotated by machine-readable tags. Technologies like RDF, OWL and microformats (to a basic extent, XML) do this. Bottom-up semantics got a big boost this week when Yahoo announced it was adding RDF descriptors to its pages
- Top-down: when a Google machine finds a document on the web, it reads it and understands the information. That’s very, very advanced computer science (according to my housemate), but that way, when a machine reads a page about Gash, it figures out whether the page is talking about a physical injury, a woman, or a vagina. That’s important if your kid is using Google to learn about first aid… an example of a top-down semantic tool is Dapper.net
Bottom up requires everyone on the Web to ‘play ball’ and change their site. There are big discussions about what format to use, etc. But Google’s withdrawal from these debates suggest that it’s working on top-down semantics and doesn’t need to weigh in on what people do to their sites.
- Google knows that humans are frankly crap at describing and organising things. That’s why Google search worked in the first place, and human-edited directories (like DMOZ, which I once was an editor for, or early-days Yahoo. It went out and found pages, and decided their relative importance, so humans don’t have to. Likewise, with Gmail, it pioneered the folder-less email service – you just search for the email you need, you don’t sort it into folders each time you want it
- For all this talk of Web3.0, Google is actually quite far down the road with understanding the closeness of a website’s content to what you searched for, and discarding irrelevant results. It doesn’t have to change a THING about the Internet, or the way Internet users behave, by incorporating better top-down semantics into it’s search algorithm. Google.com will still look the same; the only difference is that you will be able to use full sentences when you search, to better describe what you want it to find; e.g. “pages about animals like my goldfish’ (would return results about angel fish, clown fish, etc)
- If Google encourages bottom-up, it means each website does the heavy lifting; and any jackass coder can build a tool to leverage that, without too much difficulty. But with top-down, Google retains scarcity/monopoly power, because nobody (except Microsoft) can match the manpower needed to build that kind of IP. Top-down semantics are a technical challenge for Google. But bottom-up semantics would challenge Google’s business. It has the workforce to deal with technical challenges better than anyone. But marketplace evolution? Trickier.
If you take it as given that Google will succeed at whichever semantic approach it chooses, and you accept my reasoning that it can only opt for top-down semantics, and you accept that Google is a major Internet trendsetter (e.g. what Gmail did for inbox storage allowances), you reach the following plausible conclusion:
Google will settle the semantic web debate once and for all, kill bottom-up initiatives dead in the water, and build a top-down semantic web search engine that will cement the big G’s position as a market leader in web search.
That’s a warning to investors and coders who are interested in any bottom up (and even to an extent, top-down) semantic web startup. And if it settles the debate, perhaps man hours won’t be wasted on the wrong approach to organising information on the web. Far better the Dapper approach.
(1) 16,805 total employees (source: http://www.google.com/press/pressrel/revenues_q407.html) times “We’re so serious about improving search that more than a third of our people are working on it” (http://graemethickins.typepad.com/graeme_blogs_here/2008/03/googles-annual.html)
- I just stumbled across a research paper published by a Google employee and a Microsoft employee entitled “A Case for Usage Tracking to Relate Digital Objects“. I have no idea who Elin Rønby Pedersen is but she’s published both on this and on Google’s much vaunted foray into organising your health data. The paper highlights an interesting idea, potentially just as important to Future Google as Pagerank has been to Google so far. It’s not groundbreaking – you see it on, for example, Amazon. But it’s worth thinking about, applied to the whole web. The idea is that related objects – and I use the term extremely loosely here – can be identified because you looked at them during a session of Internet browsing; you started with one, and your later browsing takes you to related objects – blog posts or news articles on the same or related subject; similar…
- Having originally assumed that the reason Facebook, Hi5 and LinkedIn (FHL), amongst others, were involved in the Google Friend Connect (GFC) service, I initially wanted to write this post to argue that this was the biggest strategic mistake of their lives. Turns out, Google is involving them whether they like it or not – using their APIs to let you pull in your friend data to your Google Friend Connect profile from your other social networks. In light of this, the point I’ll argue is therefore that not slamming the door on GFC’s scraping of their data would be a fatal mistake for FHL. Needless to say, deprived of their data, GFC loses all its value to users – so this is a zero-sum game. I argued yesterday that all FHL could possibly gain from this is more information about you as you browse around the web and use…
Related posts brought to you by Yet Another Related Posts Plugin.