Archive for March 13th, 2008

Top-Down: A New Approach to the Semantic Web

Written by Alex Iskold / September 20, 2007 4:22 PM / 17 Comments

Earlier this week we wrote about the classic approach to the semantic web and the difficulties with that approach. While the original vision of the layer on top of the current web, which annotates information in a way that is “understandable” by computers, is compelling; there are technical, scientific and business issues that have been difficult to address.One of the technical difficulties that we outlined was the bottom-up nature of the classic semantic web approach. Specifically, each web site needs to annotate information in RDF, OWL, etc. in order for computers to be able to “understand” it.

As things stand today, there is little reason for web site owners to do that. The tools that would leverage the annotated information do not exist and there has not been any clearly articulated business and consumer value. Which means that there is no incentive for the sites to invest money into being compatible with the semantic web of the future.

But there are alternative approaches. We will argue that a more pragmatic, top-down approach to the semantic web not only makes sense, but is already well on the way toward becoming a reality. Many companies have been leveraging existing, unstructured information to build vertical, semantic services. Unlike the original vision, which is rather academic, these emergent solutions are driven by business and market potential.

In this post, we will look at the solution that we call the top-down approach to the semantic web, because instead of requiring developers to change or augment the web, this approach leverages and builds on top of current web as-is.

Why Do We Need The Semantic Web?

The complexity of original vision of the semantic web and lack of clear consumer benefits makes the whole project unrealistic. The simple question: Why do we need computers to understand semantics? remains largely unanswered.

While some of us think that building AI is cool, the majority of people think that AI is a little bit silly, or perhaps even unsettling. And they are right. AI for the sake of AI does not make any sense. If we are talking about building intelligent machines, and if we need to spend money and energy annotating all the information in the world for them, then there needs to be a very clear benefit.

Stated the way it is, the semantic web becomes a vision in search of a reason. What if the problem was restated from the consumer point of view? Here is what we are really looking forward to with the semantic web:

  • Spend less time searching
  • Spend less time looking at things that do not matter
  • Spend less time explaining what we want to computers

A consumer focus and clear benefit for businesses needs to be there in order for the semantic web vision to be embraced by the marketplace.

What If The Problem Is Not That Hard?

If all we are trying to do is to help people improve their online experiences, perhaps the full “understanding” of semantics by computers is not even necessary. The best online search tool today is Google, which is an algorithm based, essentially, on statistical frequency analysis and not semantics. Solutions that attempt to improve Google by focusing on generalized semantics have so far not been finding it easy to do so.

The truth is that the understanding of natural language by computers is a really hard problem. We have the language ingrained in our genes. We learn language as we grow up. We learn things iteratively. We have the chance to clarify things when we do not understand them. None of this is easily replicated with computers.

But what if it is not even necessary to build the first generation of semantic tools? What if instead of trying to teach computers natural language, we hard-wired into computers the concepts of everyday things like books, music, movies, restaurants, stocks and even people. Would that help us be more productive and find things faster?

Simple Semantics: Nouns And Verbs

When we think about a book we think about handful of things – title and author, maybe genre and the year it was published. Typically, though, we could care less about the publisher, edition and number of pages. Similarly, recipes provoke thoughts about cuisine and ingredients, while movies make us think about the plot, director, and stars.

When we think of people, we also think about a handful of things: birthday, where do they live, how we’re related to them, etc. The profiles found on popular social networks are great examples of simple semantics based around people:

Books, people, recipes, movies are all examples of nouns. The things that we do on the web around these nouns, such as looking up similar books, finding more people who work for the same company, getting more recipes from the same chef and looking up pictures of movie stars, are similar to verbs in everyday language. These are contextual actuals that are based on the understanding of the noun.

What if semantic applications hard-wired understanding and recognition of the nouns and then also hard-wired the verbs that make sense? We are actually well on our way doing just that. Vertical search engines like Spock, Retrevo, ZoomInfo, the page annotating technology from Clear Forrest, Dapper, and the Map+ extension for Firefox are just a few examples of top-down semantic web services.

The Top-Down Semantic Web Service

The essence of a top-down semantic web service is simple – leverage existing web information, apply specific, vertical semantic knowledge and then redeliver the results via a consumer-centric application. Consider the vertical search engine Spock, which scans the web for information about people. It knows how to recognize names in HTML pages and it also looks for common information about people that all people have – birthdays, locations, marital status, etc. In addition, Spock “understands” that people relate to each other. If you look up Bush, then Clinton will show up as a predecessor. If you look up Steve Jobs, then Bill Gates will come up as a rival.

In other words, Spock takes simple, everyday semantics about people and applies it to the information that already exists online. The result? A unique and useful vertical search engine for people. Further, note that Spock does not require the information to be re-annotated in RDF and OWL. Instead, the company builds adapters that use heuristics to get the data. The engine does not actually have full understanding of semantics about people, however. For example, it does not know that people like different kinds of ice cream, but it doesn’t need to. The point is that by focusing on a simple semantics, Spock is able to deliver a useful end-user service.

Another, much simpler, example is the Map+ add-on for Firefox. This application recognizes addresses and provides a map popup using Yahoo! Maps. It is the simplicity of this application that precisely conveys the power of simple semantics. The add-on “knows” what addresses look like. Sure, sometimes it makes mistakes, but most of the time it tags addresses in online documents properly. So it leverages existing information and then provides direct end user utility by meshing it up with Yahoo! Maps.

The Challenges Facing The Top-Down Approach

Despite being effective, the somewhat simplistic top-down approach has several problems. First, it is not really the semantic web as it is defined, instead its a group of semantic web services and applications that create utility by leveraging simple semantics. So the proponents of the classic approach would protest and they would be right. Another issue is that these services do not always get semantics right because of ambiguities. Because the recognition is algorithmic and not based on an underlying RDF representation, it is not perfect.

It seems to me that it is better to have simpler solutions that work 90% of the time than complex ones that never arrive. The key questions here are: How exactly are mistakes handled? And, is there a way for the user to correct the problem? The answers will be left up to the individual application. In life we are used to other people being unpredictable, but with computers, at least in theory, we expect things to work the same every time.

Yet another issue is that these simple solutions may not scale well. If the underlying unstructured data changes can the algorithms be changed quickly enough? This is always an issue with things that sit on top of other things without an API. Of course, if more web sites had APIs, as we have previously suggested, the top-down semantic web would be much easier and more certain.


While the original vision of the semantic web is grandiose and inspiring in practice it has been difficult to achieve because of the engineering, scientific and business challenges. The lack of specific and simple consumer focus makes it mostly an academic exercise. In the mean time, existing data is being leveraged by applying simple heuristics and making assumptions about particular verticals. What we have dubbed top-down semantic web applications have been appearing online and improving end user experiences by leveraging semantics to deliver real, tangible services.

Will the bottom-up semantic web ever happen? Possibly. But, at the moment the precise path to get there is not quite clear. In the mean time, we can all enjoy better online experience and get to where we need to go faster thanks to simple top-down semantic web services.

Leave a comment or trackback on ReadWriteWeb and be in to win a $30 Amazon voucher – courtesy of our competition sponsors AdaptiveBlue and their Netflix Queue Widget.

5 TrackBacks

Listed below are links to blogs that reference this entry: Top-Down: A New Approach to the Semantic Web.TrackBack URL for this entry: http://www.readwriteweb.com/cgi-bin/mt/mt-tb.cgi/1638
Summary: The original vision of the semantic web as a layer on top of the current web, annotated in a way that computers can “understand,” is certainly grandiose and intriguing. Yet, for the past decade it has been a kind… Read More
Alex Iskold’s ‘Semantic Web: Difficulties with the Classic Approach’ for Read/Write Web was one of the posts rolled up into yesterday’s outpouring here on Nodalities. He’s been busy during the (my) night, and I woke this morning to ‘Top-Down:… Read More
Yesterday brought an enlightening post by Alex Iskold, entitled “Top-Down: A New Approach to the Semantic Web“: “While the original vision of the semantic web is grandiose and inspiring in practice it has been difficult to achieve bec… Read More
Here is a summary of the week’s Web Tech action on Read/WriteWeb. Note that you can subscribe to the Weekly Wrapups, either via the special RSS feed or by email. Web News Yahoo! Drops $350m on Zimbra; an Open Source,… Read More
Em teoria a web sem√¢ntica √© fant√°stica, ou seja, redescrever toda a informa√ß√£o que j√° existe na web na tentativa de fazer os computadores entenderem o significado das coisas. Em poucas palavras, seria uma camada a mais na web com meta-informa√ß√µ… Read More


Subscribe to comments for this post OR Subscribe to comments for all Read/WriteWeb posts

  • Hi Alex. The top-down approach alone is not enough to reach the Semantic Web. It’s not even enough to reach the half-hearted attempt at the Semantic Web that you describe. I believe both the bottom-up and top-down approaches will be needed to reach the goal. At this time we’re faced with few too many people attempting either approach. Top-down isn’t even fully feasible yet, whereas a bottom-up approach can at least be done with currently available technology.I fully disagree with your statement that the complexity of original vision of the semantic web and lack of clear consumer benefits makes the whole project unrealistic.

    Posted by: James | September 20, 2007 4:58 PM

  • I think the use of Microformats does provide some actual practical usage of a form of machine-readable semantic formatting for content. Ok, it’s maybe not quite “the semantic web” people envision but it does have some usefulness.I also read an blog post yesterday by Peter Krantz entitled “RDFa – Implications for Accessibility” which talks about the W3C’s RDFa HTML extensions as opposed to Microformats as a means to include machine readable data.

    I wasn’t sure if I should be writing ‘semantic web’ with a capital S as some people seem to use that as suggesting something more than just the concept of ‘semantic standards compliant HTML / XHTML’!

    Posted by: Rick Curran | September 20, 2007 5:10 PM

  • If you’re talking about the Semantic Web (the W3C attempt, and the actual Semantic Web) then you use caps for the S and the W. If you’re talking about types of webs (reactive web, proactive web, semantic web) you would not use caps (proper noun vs noun).Posted by: James | September 20, 2007 5:29 PM

  • My bullshit meter went off after the first paragraph. You are so far off dude. You should take some time to “understand” it before you try to write about it. Otherwise you are just making noise.Your article is just noise.

    Where do I check that this article is not useful?

    Posted by: Ken Ewell | September 20, 2007 6:07 PM

  • Good article, especially about the top down vs. bottom up. I am working on a very specific problem – make it easy for teachers to create lessons – and search is not an answer ! We are working on an overlay – a top down semantic web, which not only includes normal metadata but also more domain specific, contextual information. SOme thoughts at http://doubleclix.wordpress.com/Posted by: Krishna Sankar | September 20, 2007 6:08 PM

  • The bottom-up and top-down approaches are not mutually exclusive, so there is no point in trying to pit one against the other. And indeed, why wait until the perfect vision is implemented? If some value can be delivered now by cutting corners, and more value later by investing in a more formal approach in parallel, then surely everyone wins. If a top-down service is able to cheaply extract facts from the Web now, then surely, it should be able to easily translate these facts into predicates (and map them to ontologies) so as to plug into bottom-up machinery (rules, proofs, etc.) as it becomes available. It’s all good.Posted by: Jean-Michel Decombe | September 20, 2007 6:21 PM

  • I must admit I was very disappointed to see an article on this topic with such a wide audience not take the opportunity to increase the visibility of microformats.In the perfect world, publishing platforms (wordpress, cms’s in general,etc.) would allow the publisher to easily mark certain parts of their content with semantic value, using microformats.

    Then, all modern browsers should be able to recognize them and provide the users with some useful actions. Add hCards to your address book, events to your calendar of choice, etc. The list goes on and on.

    From where I stand, we’re not that far. Firefox 3, MS IE8 and Apple have all shown interest in this matter. Let’s all hold hands and see what they have in store for us.

    Sir Tim Berners-Lee is much more than a dreamer. He is, as we all know, a visionary. Thank you, Sir.

    I hate spamming, but if you’re interested in these matters visit microformats dot org for more info and/or click my name for a fresh screencast showing how this works for the users.


    Posted by: André Luís | September 20, 2007 6:24 PM

  • How many stacked straw men does it take to reach the Moon? Apparently, both not too many, and quite a few.This week’s dust up about “what is the semantic Web?” is but a mote in the eye of history, and even within very recent history (say, 1-2 years) at that.

    The real story behind everyone desiring to state the obvious about easy things and hard things relating to information federation is that commercial prospects must be near at hand. I take this as good news.

    It will be interesting to see whose silks get dirtied as this jockeying continues out of the gate.

    Posted by: Mike Bergman | September 20, 2007 7:17 PM

  • I get the same sense of things Mike. The heated debating back and forth shows not only that the Semantic Web is taking in larger numbers of followers, but that we’re nearing a time when we can put what we’ve researched to practical use.The interesting thing to me will be the kind of products and services that will emerge. To me it’s not entirely clear yet what markets will be most profitable for Semantic Web technology (and semantic technologies in general).

    I hear that there is a lot of money in the market for a system that radically simplifies data exchange in the enterprise, but consumer products and services I’m not sure about. I don’t think there will be a market for “Semantic Web browsers.” I’m sure Firefox 4 will accommodate any such needs, and I would hope that becomes the case.

    I need to do my research on what the current “Semantic Web companies” are up to.

    Posted by: James | September 20, 2007 8:53 PM

  • I lost my comment on your last post somewhere, so I blogged it.All I’d add here is that most of the systems you describe are effectively domain-specific data silos. Unless there’s Web-based interop, these things are merely on the Web, not of the Web. Semantic Web technologies are designed for truly Web-based data integration, they are essentially an evolution of the link.

    Mike’s comment above creased me up – especially since you only have to look at his collection of Sweet Tools to see the “bottom-up Semantic Web” is coming along just fine, thank you 🙂 He does have a point – all this is really about is moving from a Web of Documents to a more general Web of Data.

    For a continual update, subscribe to Planet RDF, or even This Week’s Semantic Web. Coders might also be interested in the Developers Guide to Semantic Web Toolkits
    for different Programming Languages
    . As well as tools applications, there’s more and more linked data appearing on the Web all the time…

    Posted by: Danny | September 21, 2007 3:38 AM

  • some good points though I think you might use some less words. e.g. the goalsSpend less time searching
    Spend less time looking at things that do not matter
    Spend less time explaining what we want to computers

    are in short: “spend less time on things you do not like”.

    still, excellent post. as always 😉

    Posted by: Peter P | September 21, 2007 5:02 AM

  • Alex,
    You make good points. We need both top-down and bottom-up approaches.Isn’t GRDDL (http://www.w3.org/2004/01/rdxh/spec) a generic approach to gather information from documents?

    Simile projects and RDFizers are worth a look (http://simile.mit.edu/wiki/RDFizers)

    I think semantic web components – a way to describe the components that make up web applications, may be another approach to build bottom-up web.

    We do need a general framework of resource description as a common vocabulary whether our approach is top-down or bottom-up.

    We do need more dialog and I am glad that you started it with this post.


    Posted by: Dorai Thodla | September 21, 2007 5:23 AM

  • Peter’s P’s last comment pretty much sums up what everyone wants from new web technology (regardless of whether or not it falls under the semantic web umbrella), doesn’t it?And I quote:

    “Spend less time searching
    Spend less time looking at things that do not matter
    Spend less time explaining what we want to computers”

    In general, people want to spend less time doing the boring stuff and get right to good/relevant/interesting stuff (if I could add a picture, I’d totally post the “This is relevant to my interests” lolcat right, because really, what topic couldn’t benefit from a little lolcat levity?

    *by the way, since I know that many RWW readers are of the entrepreneurial type, if anyone is working on a project or has an idea that accomplishes the above missions, check out the Knight News Challenge – http://newschallenge.org.

    Posted by: Jackie | September 21, 2007 11:04 AM

  • I have just retired after finished spending years trying to play a small part in controlling a corporate intranet with rules as basic as “use HTML”. It degenerated into a collection of thousands of PDFs (with internal links)and even Word documents posted straight to the Intranet. In spite of supplied templates and document management tools the information suppliers saw the Intranet as if it were a paper filing cabinet. HTML combined with proper use of CSS goes a long way towards basic structure but even when given the tools information suppliers will not see the reason to use them.Posted by: Albert Mispel | September 21, 2007 1:23 PM

  • Good effort, but there is very little new here. Lots of work has been done in the area of semantic integration which understands that an inference architecture will always result in false associations that typically require lots of manual refinements (customizations) of ontologies.Semantic integration (where mission critical systems are involved) is a case of the good being the enemy of the perfect. If only we could return lots of choices and let the user pick. Google has it easy.

    Posted by: Pano | September 21, 2007 6:57 PM

  • yea google… they will get this …Posted by: Nature Wallpaper | September 21, 2007 11:34 PM

  • Thanks a lot for this post and the previous one on semantic web. Really interesting. I was wondering whether you will address what you said about computers not being able to understand human language, later on. I think this is one of the fundamental problems with semantic web. Although I do agree with you that we should do what we can, even if that means we cannot get any further than the “simple semantic web”. More comments on my blog.Posted by: Samuel Driessen | December 18, 2007 12:36 PM

Read Full Post »

%d bloggers like this: