Feeds:
Posts
Comments

Posts Tagged ‘Web’

Top-Down: A New Approach to the Semantic Web

Written by Alex Iskold / September 20, 2007 4:22 PM / 17 Comments


Earlier this week we wrote about the classic approach to the semantic web and the difficulties with that approach. While the original vision of the layer on top of the current web, which annotates information in a way that is “understandable” by computers, is compelling; there are technical, scientific and business issues that have been difficult to address.

One of the technical difficulties that we outlined was the bottom-up nature of the classic semantic web approach. Specifically, each web site needs to annotate information in RDF, OWL, etc. in order for computers to be able to “understand” it.

As things stand today, there is little reason for web site owners to do that. The tools that would leverage the annotated information do not exist and there has not been any clearly articulated business and consumer value. Which means that there is no incentive for the sites to invest money into being compatible with the semantic web of the future.

But there are alternative approaches. We will argue that a more pragmatic, top-down approach to the semantic web not only makes sense, but is already well on the way toward becoming a reality. Many companies have been leveraging existing, unstructured information to build vertical, semantic services. Unlike the original vision, which is rather academic, these emergent solutions are driven by business and market potential.

In this post, we will look at the solution that we call the top-down approach to the semantic web, because instead of requiring developers to change or augment the web, this approach leverages and builds on top of current web as-is.

Why Do We Need The Semantic Web?

The complexity of original vision of the semantic web and lack of clear consumer benefits makes the whole project unrealistic. The simple question: Why do we need computers to understand semantics? remains largely unanswered.

While some of us think that building AI is cool, the majority of people think that AI is a little bit silly, or perhaps even unsettling. And they are right. AI for the sake of AI does not make any sense. If we are talking about building intelligent machines, and if we need to spend money and energy annotating all the information in the world for them, then there needs to be a very clear benefit.

Stated the way it is, the semantic web becomes a vision in search of a reason. What if the problem was restated from the consumer point of view? Here is what we are really looking forward to with the semantic web:

 

  • Spend less time searching
  • Spend less time looking at things that do not matter
  • Spend less time explaining what we want to computers

 

A consumer focus and clear benefit for businesses needs to be there in order for the semantic web vision to be embraced by the marketplace.

What If The Problem Is Not That Hard?

If all we are trying to do is to help people improve their online experiences, perhaps the full “understanding” of semantics by computers is not even necessary. The best online search tool today is Google, which is an algorithm based, essentially, on statistical frequency analysis and not semantics. Solutions that attempt to improve Google by focusing on generalized semantics have so far not been finding it easy to do so.

The truth is that the understanding of natural language by computers is a really hard problem. We have the language ingrained in our genes. We learn language as we grow up. We learn things iteratively. We have the chance to clarify things when we do not understand them. None of this is easily replicated with computers.

But what if it is not even necessary to build the first generation of semantic tools? What if instead of trying to teach computers natural language, we hard-wired into computers the concepts of everyday things like books, music, movies, restaurants, stocks and even people. Would that help us be more productive and find things faster?

Simple Semantics: Nouns And Verbs

When we think about a book we think about handful of things – title and author, maybe genre and the year it was published. Typically, though, we could care less about the publisher, edition and number of pages. Similarly, recipes provoke thoughts about cuisine and ingredients, while movies make us think about the plot, director, and stars.

When we think of people, we also think about a handful of things: birthday, where do they live, how we’re related to them, etc. The profiles found on popular social networks are great examples of simple semantics based around people:

Books, people, recipes, movies are all examples of nouns. The things that we do on the web around these nouns, such as looking up similar books, finding more people who work for the same company, getting more recipes from the same chef and looking up pictures of movie stars, are similar to verbs in everyday language. These are contextual actuals that are based on the understanding of the noun.

What if semantic applications hard-wired understanding and recognition of the nouns and then also hard-wired the verbs that make sense? We are actually well on our way doing just that. Vertical search engines like Spock, Retrevo, ZoomInfo, the page annotating technology from Clear Forrest, Dapper, and the Map+ extension for Firefox are just a few examples of top-down semantic web services.

The Top-Down Semantic Web Service

The essence of a top-down semantic web service is simple – leverage existing web information, apply specific, vertical semantic knowledge and then redeliver the results via a consumer-centric application. Consider the vertical search engine Spock, which scans the web for information about people. It knows how to recognize names in HTML pages and it also looks for common information about people that all people have – birthdays, locations, marital status, etc. In addition, Spock “understands” that people relate to each other. If you look up Bush, then Clinton will show up as a predecessor. If you look up Steve Jobs, then Bill Gates will come up as a rival.

In other words, Spock takes simple, everyday semantics about people and applies it to the information that already exists online. The result? A unique and useful vertical search engine for people. Further, note that Spock does not require the information to be re-annotated in RDF and OWL. Instead, the company builds adapters that use heuristics to get the data. The engine does not actually have full understanding of semantics about people, however. For example, it does not know that people like different kinds of ice cream, but it doesn’t need to. The point is that by focusing on a simple semantics, Spock is able to deliver a useful end-user service.

Another, much simpler, example is the Map+ add-on for Firefox. This application recognizes addresses and provides a map popup using Yahoo! Maps. It is the simplicity of this application that precisely conveys the power of simple semantics. The add-on “knows” what addresses look like. Sure, sometimes it makes mistakes, but most of the time it tags addresses in online documents properly. So it leverages existing information and then provides direct end user utility by meshing it up with Yahoo! Maps.

The Challenges Facing The Top-Down Approach

Despite being effective, the somewhat simplistic top-down approach has several problems. First, it is not really the semantic web as it is defined, instead its a group of semantic web services and applications that create utility by leveraging simple semantics. So the proponents of the classic approach would protest and they would be right. Another issue is that these services do not always get semantics right because of ambiguities. Because the recognition is algorithmic and not based on an underlying RDF representation, it is not perfect.

It seems to me that it is better to have simpler solutions that work 90% of the time than complex ones that never arrive. The key questions here are: How exactly are mistakes handled? And, is there a way for the user to correct the problem? The answers will be left up to the individual application. In life we are used to other people being unpredictable, but with computers, at least in theory, we expect things to work the same every time.

Yet another issue is that these simple solutions may not scale well. If the underlying unstructured data changes can the algorithms be changed quickly enough? This is always an issue with things that sit on top of other things without an API. Of course, if more web sites had APIs, as we have previously suggested, the top-down semantic web would be much easier and more certain.

Conclusion

While the original vision of the semantic web is grandiose and inspiring in practice it has been difficult to achieve because of the engineering, scientific and business challenges. The lack of specific and simple consumer focus makes it mostly an academic exercise. In the mean time, existing data is being leveraged by applying simple heuristics and making assumptions about particular verticals. What we have dubbed top-down semantic web applications have been appearing online and improving end user experiences by leveraging semantics to deliver real, tangible services.

Will the bottom-up semantic web ever happen? Possibly. But, at the moment the precise path to get there is not quite clear. In the mean time, we can all enjoy better online experience and get to where we need to go faster thanks to simple top-down semantic web services.

Read Full Post »

Social Graph & Beyond: Tim Berners-Lee’s Graph is The Next Level

Written by Richard MacManus / November 22, 2007 5:55 PM / 12 Comments


Tim Berners-Lee, inventor of the World Wide Web, today published a blog post about what he terms the Graph, which is similar (if not identical) to his Semantic Web vision. Referencing both Brad Fitzpatrick’s influential post earlier this year on Social Graph, and our own Alex Iskold’s analysis of Social Graph concepts, Berners-Lee went on to position the Graph as the third main “level” of computer networks. First there was the Internet, then the Web, and now the Graph – which Sir Tim labeled (somewhat tongue in cheek) the Giant Global Graph!

Note that Berners-Lee wasn’t specifically talking about the Social Graph, which is the term Facebook has been heavily promoting, but something more general. In a nutshell, this is how Berners-Lee envisions the 3 levels (a.k.a. layers of abstraction):

1. The Internet: links computers
2. Web: links documents
3. Graph: links relationships between people and/or documents — “the things documents are about” as Berners-Lee put it.

The Graph is all about connections and re-use of data. Berners-Lee wrote that Semantic Web technologies will enable this:

“So, if only we could express these relationships, such as my social graph, in a way that is above the level of documents, then we would get re-use. That’s just what the graph does for us. We have the technology — it is Semantic Web technology, starting with RDF OWL and SPARQL. Not magic bullets, but the tools which allow us to break free of the document layer.”

Sir Tim also notes that as we go up each level, we lose more control but gain more benefits: “…at each layer — Net, Web, or Graph — we have ceded some control for greater benefits.” The benefits are what happens when documents and data are connected – for example being able to re-use our personal and friends data across multiple social networks, which is what Google’s OpenSocial aims to achieve.

What’s more, says Berners-Lee, the Graph has major implications for the Mobile Web. He said that longer term “thinking in terms of the graph rather than the web is critical to us making best use of the mobile web, the zoo of wildy differing devices which will give us access to the system.” The following scenario sums it up very nicely:

“Then, when I book a flight it is the flight that interests me. Not the flight page on the travel site, or the flight page on the airline site, but the URI (issued by the airlines) of the flight itself. That’s what I will bookmark. And whichever device I use to look up the bookmark, phone or office wall, it will access a situation-appropriate view of an integration of everything I know about that flight from different sources. The task of booking and taking the flight will involve many interactions. And all throughout them, that task and the flight will be primary things in my awareness, the websites involved will be secondary things, and the network and the devices tertiary.”

Conclusion

I’m very pleased Tim Berners-Lee has appropriated the concept of the Social Graph and married it to his own vision of the Semantic Web. What Berners-Lee wrote today goes way beyond Facebook, OpenSocial, or social networking in general. It is about how we interact with data on the Web (whether it be mobile or PC or a device like the Amazon Kindle) and the connections that we can take advantage of using the network. This is also why Semantic Apps are so interesting right now, as they take data connection to the next level on the Web.

Overall, unlike Nick Carr, I’m not concerned whether mainstream people accept the term ‘Graph’ or ‘Social Graph’. It really doesn’t matter, so long as the web apps that people use enable them to participate in this ‘next level’ of the Web. That’s what Google, Facebook, and a lot of other companies are trying to achieve.

Incidentally, it’s great to see Tim Berners-Lee ‘re-using’ concepts like the Social Graph, or simply taking inspiration from them. He never really took to the Web 2.0 concept, perhaps because it became too hyped and commercialized, but the fact is that the Consumer Web has given us many innovations over the past few years. Everything from Google to YouTube to MySpace to Facebook. So even though Sir Tim has always been about graphs (as he noted in his post, the Graph is essentially the same as the Semantic Web), it’s fantastic he is reaching out to the ‘web 2.0’ community and citing people like Brad Fitzpatrick and Alex Iskold.

Related: check out Alex Iskold’s Social Graph: Concepts and Issues for an overview of the theory behind Social Graph. This is the post Tim Berners-Lee referenced. Also check out Alex’s latest post today: R/WW Thanksgiving: Thank You Google for Open Social (Or, Why Open Social Really Matters).

Read Full Post »

Semantic Travel Search Engine UpTake Launches

Written by Josh Catone / May 14, 2008 6:00 AM / 8 Comments


According to a comScore study done last year, booking travel over the Internet has become something of a nightmare for people. It’s not that using any of the booking engines is difficult, it’s just that there is so much information out there that planning a vacation is overwhelming. According to the comScore study, the average online vacation plan comes together through 12 travel-related searches and visits to 22 different web sites over the course of 29 days. Semantic search startup UpTake (formerly Kango) aims to make that process easier.

UpTake is a vertical search engine that has assembled what it says is the largest database of US hotels and activities — over 400,000 of them — from more than 1,000 different travel sites. Using a top-down approach, UpTake looks at its database of over 20 million reviews, opinions, and descriptions of hotels and activities in the US and semantically extracts information about those destinations. You can think of it as Metacritic for the travel vertical, but rather than just arriving at an aggregate rating (which it does), UpTake also attempts to figure out some basic concepts about a hotel or activity based on what it learns from the information it reads. Things such as, is the hotel family friendly, would it be good for a romantic getaway, is it eco friendly, etc.

“UpTake matches a traveler with the most useful reviews, photos, etc. for the most relevant hotels and activities through attribute and sentiment analysis of reviews and other text, the analysis is guided by our travel ontology to extract weighted meta-tags,” said President Yen Lee, who was co-founder of the CitySearch San Francisco office and a former GM of Travel at Yahoo!

What UpTake isn’t, is a booking engine like Expedia, a meta price search engine like Kayak, or a travel community. UpTake is strictly about aggregation of reviews and semantic analysis and doesn’t actually do any booking. According to the company only 14% of travel searches start at a booking engine, which indicates that people are generally more interested in doing research about a destination before trying to locate the best prices. Many listings on the site have a “Check Rates” button, however, which gets hotel rates from third party partner sites — that’s actually how UpTake plans to make money.

The way UpTake works is by applying its specially created travel ontology, which contains concepts, relationships between those concepts, and rules about how they fit together, to the 20 million reviews in its database. The ontology allows UpTake to extract meaning from structured or semi-structured data by telling their search engine things like “a pool is a type of hotel amenity and kids like pools.” That means hotels with pools score some points when evaluating if a hotel is “kid friendly.” The ontology also knows, though, that a nude pool might be inappropriate for kids, and thus that would take points away when evaluating for kid friendliness.

A simplified example ontology is depicted below.

In addition to figuring out where destinations fit into vacation themes — like romantic getaway, family vacation, girls getaway, or outdoor — the site also does sentiment matching to determine if users liked a particular hotel or activity. The search engine looks for sentiment words such as “like,” “love,” “hate,” “cramped,” or “good view,” and knows what they mean and how they relate to the theme of the hotel and how people felt about it. It figures that information into the score it assigns each destination.

Conclusion

Yesterday, we looked at semantic, natural language processing search engine Powerset and found in some quick early testing that the results weren’t that much different than Google. “If Google remains ‘good enough,’ Powerset will have a hard time convincing people to switch,” we wrote. But while semantic search may feel rather clunky for the broader global web, it makes a lot of sense in specific verticals. The ontology is a lot more focused and the site also isn’t trying to answer specific questions, but rather attempting to semantically determine general concepts, such as romanticness or overall quality. The upshot is that the results are tangible and useful.

I asked Yen Lee what UpTake thought about the top-down vs. the traditional bottom-up approach. Lee told me that he thinks the top-down approach is a great way to lead into the bottom-up Semantic Web. Lee thinks that top-down efforts to derive meaning from unstructured and semi-structured data, as well as efforts such as Yahoo!’s move to index semantic markup, will provide an incentive for content publishers to start using semantic markup on their data. Lee said that many of UpTake’s partners have already begun to ask how to make it easier for the site to read and understand their content.

Vertical search engines like UpTake might also provide the consumer face for the Semantic Web that can help sell it to consumers. Being able to search millions of reviews and opinions and have a computer understand how they relate to the type of vacation you want to take is the sort of palpable evidence needed to sell the Semantic Web idea. As these technologies get better, and data becomes more structured, then we might see NLP search engines like Powerset start to come up with better results than Google (though don’t think for a minute that Google would sit idly by and let that happen…).

What do you think of UpTake? Let us know int he comments below.

Read Full Post »

Semantic Web: Making Advertising More Relevant to Consumers

Written by Lidija Davis / October 17, 2008 1:10 AM / 35 Comments


Amiad Solomon, CEO of Peer39, kicked off the Web 3.0 Conference & Expo in Santa Clara, CA on Thursday with a keynote discussing the Semantic Web and how it relates to advertising. He told the audience that this is one of the key business opportunities in the Web 3.0 era. “I believe the simplest definition of Web 3.0 is the monetization and commercialization of Web 2.0,” he said.

To fully appreciate how Web 3.0 can offer better advertising solutions, Solomon suggested that we start by analyzing the Web’s transformations since Tim Berners-Lee and Robert Cailliau wrote the official proposal for the World Wide Web in 1990.

The Evolution of the Web According to Solomon

Web 1.0 was basic connection via the Internet, where information flowed one way and was rarely updated. Web 1.0 ended in 2001 with the crash of the dot com era that some estimate cost in excess of $5 Trillion. The Web 1.0 lesson: Cash, not content, is king.

Web 2.0 marked the beginning of the ‘two sided Internet,’ where we started using the Internet to talk to one another. This interactivity generated billions of dollars in data – virtually for free. The Web 2.0 lesson: Sustainable revenues are possible.

Web 3.0 offers detailed data exchange to every point on the Internet, a ‘machine in the middle,’ with three main characteristics:

1. Smart internetworking

The Internet itself will get smarter and become a gathering tool to execute relatively complex tasks and analyze collective online behavior.

2. Seamless applications

Web 3.0 theories suggest that all applications will fit together; a continuation of open source where all applications will be able to communicate. APIs will read data from any platform and provide a single point of reference.

3. Distributed databases

Web 3.0 will need somewhere to store very complex and memory intensive information. It will require ontologies to establish relationships between information sources; search millions of nodes, and scan billions of data records at once.

How Does This Make Money?

“This is where the semantic Web comes in,” Solomon explained. “Businesses finally understand the Internet, and recognize that advertising is a good business model – if you can make it work.”

According to Solomon, there are two approaches to advertising currently being used; contextual advertising and behavioral targeting:

Contextual advertising systems scan website text for keywords that trigger the system to send predetermined ads. Used in search engine results page, contextual systems show ads based on users search words; unfortunately, these ads aren’t always relevant as words can have several meanings. While errors occasionally result in humor, and are good for a laugh, contextual ads show a serious weakness: companies investing in them are wasting advertising budgets, brand promotion and sentiment.

Behavioral targeting systems collect information on a person’s Web browsing history, usually by way of cookies. Given the European Union’s Directive 2002/58 on privacy and electronic communications, and pending US legislation restricting the use of cookies, behavioral targeting campaigns via cookies can no longer be seen as a valuable investment. Additionally, home computers are oftentimes shared, and if cookies are enabled, users get to see ads directed by other user’s cookies. Again, badly targeted advertising can be a nuisance for the user, and a waste of advertising dollars.

The Way of the Future: Semantic Advertising

Successful advertising means showing the right product to the right person at the right time. The semantic Web puts data into semantic formats on the fly, and targets ads based on the meaning of data with a high degree of accuracy.

This is good news for the user – no more embarrassing keyword results, no more Hooters ads on sites about feminism, and an end to annoying cookies.

Do you agree that the Semantic Web will bring even more effective advertising to the Web?

ReadWriteWeb is a media sponsor of the Web 3.0 Conference & Expo

Read Full Post »


Report: Semantic Web Companies Are, or Will Soon Begin, Making Money

Written by Marshall Kirkpatrick / October 3, 2008 5:13 PM / 14 Comments


provostpic-1.jpgSemantic Web entrepreneur David Provost has published a report about the state of business in the Semantic Web and it’s a good read for anyone interested in the sector. It’s titled On the Cusp: A Global Review of the Semantic Web Industry. We also mentioned it in our post Where Are All The RDF-based Semantic Web Apps?.

The Semantic Web is a collection of technologies that makes the meaning of content online understandable by machines. After surveying 17 Semantic Web companies, Provost concludes that Semantic science is being productized, differentiated, invested in by mainstream players and increasingly sought after in the business world.

Provost aims to use real-world examples to articulate the value proposition of the Semantic Web in accessible, non-technical language. That there are enough examples available for him to do this is great. His conclusions don’t always seem as well supported by his evidence as he’d like – but the profiles he writes of 17 Semantic Web companies are very interesting to read.

What are these companies doing? Provost writes:

“..some companies are beginning to focus on specific uses of Semantic technology to create solutions in areas like knowledge management, risk management, content management and more. This is a key development in the Semantic Web industry because until fairly recently, most vendors simply sold development tools.”

 

The report surveys companies ranging from the innovative but unlaunched Anzo for Excel from Cambridge Semantics, to well-known big players like Down Jones Client Solutions and RWW sponsor Reuters Calais Initiative, to relatively unknown big players like the already very commercialized Expert System. 10 of the companies were from the US, 6 from Europe and 1 from South Korea.

semwebchart.jpgAbove: Chart from Provost’s report.We’ve been wanting to learn more about “under the radar” but commercialized semantic web companies ever since doing a briefing with Expert System a few months ago. We had never heard of the Italian company before, but they believe they already have they have a richer, deeper semantic index than anyone else online. They told us their database at the time contained 350k English words and 2.8m relationships between them. including geographic representations. They power Microsoft’s spell checker and the Natural Language Processing (NLP) in the Blackberry. They also sell NLP software to the US military and Department of Homeland Security, which didn’t seem like anything to brag about to us but presumably makes up a significant part of the $12 million+ in revenue they told Provost they made last year.

And some people say the Semantic Web only exists inside the laboratories of Web 3.0 eggheads!

Shortcomings of the Report

Provost writes that “the vendors [in] this report have all the appearances of thriving, emerging technology companies and they have shown their readiness to cross borders, continents, and oceans to reach customers.” You’d think they turned water into wine. Those are strong words for a study in which only 4 of 17 companies were willing to report their revenue and several hadn’t launched products yet.

The logic here is sometimes pretty amazing.

The above examples [there were two discussed – RWW] are just a brief sampling of the commercial success that the Semantic Web has been experiencing. In broad terms, it’s easy to point out the longevity of many companies in this industry and use that as a proxy for commercial success [wow – RWW]. With more time (and space in this report), additional examples could be described but the most interesting prospect pertains to what the industry landscape will look like in twelve months. [hmmm…-RWW]

 

In fact, while Provost has glowingly positive things to about all the companies he surveyed, the absence of engagement with any of their shortcomings makes the report read more like marketing material than any objective take on what’s supposed to be world-changing technology.

This is a Fun Read

The fact is, though, that Provost writes a great introduction to many companies working to sell software in a field still too widely believed to be ephemeral. The stories of each of the 17 companies profiled are fun to read and many of Provost’s points of analysis are both intuitive and thought provoking.

He says the sector is “on the cusp” of major penetration into existing markets currently served by non-semantic software. Provost argues that the Semantic Web struggles to explain itself because the World Wide Web is so intensely visual and semantics are not. He says that reselling business partners in specific distribution channels are combining their domain knowledge with the science of the software developers to bring these tools to market. He tells a great, if unattributed, story about what Linked Data could mean to the banking industry.

We hadn’t heard of several of the companies profiled in the report, and a handful of them had never been mentioned by the 34 semantic web specialist blogs we track, either.

There’s something here for everyone. You can read the full report here.

Read Full Post »

Web 3.0: Is It About Personalization?

Written by Josh Catone / February 5, 2008 2:00 AM / 52 Comments


On the UK’s Guardian newspaper site today, writer Jemina Kiss suggested that Web 3.0 will be about recommendation. “If web 2.0 could be summarized as interaction, web 3.0 must be about recommendation and personalization,” she wrote. Using Last.fm and Facebook’s Beacon as an example, Kiss painted a picture of a web where personalized recommendation services can feed us information on new music, new products, and where to eat. It’s a marketers dream and it’s really not far off from the definitions we’ve come up with in the past here on ReadWriteWeb.

We’ve written about web 3.0 and attempted to define it many, many times here over the past year. One of the common themes between almost all of the posts is that Web 3.0 and the vision of the Semantic Web are joined at the hip.

Last April, we held a contest asking readers for their web 3.0 definitions. Our favorite came from Robert O’Brien, who defined Web 3.0 as a “decentralized asynchronous me.”

“Web 1.0: Centralized Them. Web 2.0: Distributed Us. Web 3.0: Decentralized Me,” he wrote. “[Web 3.0 is] about me when I don’t want to participate in the world. It’s about me when I want to have more control of my environment particularly who I let in. When my attention is stretched who/what do I pay attention to and who do I let pay attention to me. It is more effective communication for me!”

What O’Brien was getting at is basically what Kiss was getting at: personalization and recommendation. And that’s the promise of the Semantic Web. The easiest way to sell the Semantic Web vision to consumers is to talk about how it can make their lives easier. When machines understand things in human terms, and can apply that knowledge to your attention data, we’ll have a web that knows what we want and when we want it.

ReadWriteWeb contributor Sramana Mitra put it another way on this blog last February, when she said that web 3.0 will be about adding context to personalization. “Personalization has remained limited to some unsatisfactory efforts by the MyYahoo team, their primary disadvantage being the lack of a starting Context,” she wrote. “In Web 3.0, I predict, we are going to start seeing roll-ups. We will see a trunk that emerges from the Context, be it film (Netflix), music (iTunes), cooking / food, working women, single parents, … and assembles the Web 3.0 formula that addresses the whole set of needs of a consumer in that Context.” Or in other words, web 3.0 will be about feeding you the information that you want, when you want it (in the proper context).

Of course, the versioning of the Internet is kind of silly, and probably shouldn’t keep going, but it is a fun way to look to the future and predict what we might be coming our way. What do you think of Kiss’s idea about web 3.0 being about recommendation and personalization?

Read Full Post »

The semantic elephant in the room – Google will settle the “top down vs. bottom up” debate for us

Here is a useful primer into what some people (perhaps not the best advised) are calling Web3.0.

The fundamental principle of semantifying data is that information becomes more easily found and understood by computers. Mix that with AI and you’ve got some very, very powerful, useful tools for information gathering, processing and decision making!

So why is Google – the information lynchpin of the Internet, and thus, of modern society – not THE focus of attention in all this hubris about Web3.0?

This is a company with around five THOUSAND(1) computer scientists devoted to improving their search engine (~35,000 man hours a day). SURELY they’re building some amazing semantic IP that will help cement their dominance.

A big debate in the semantic field at the moment is whether the best approach is ‘top-down’ or ‘bottom-up’

  1. Bottom-up: when information is created, it is annotated by machine-readable tags. Technologies like RDF, OWL and microformats (to a basic extent, XML) do this. Bottom-up semantics got a big boost this week when Yahoo announced it was adding RDF descriptors to its pages
  2. Top-down: when a Google machine finds a document on the web, it reads it and understands the information. That’s very, very advanced computer science (according to my housemate), but that way, when a machine reads a page about Gash, it figures out whether the page is talking about a physical injury, a woman, or a vagina. That’s important if your kid is using Google to learn about first aid… an example of a top-down semantic tool is Dapper.net

Bottom up requires everyone on the Web to ‘play ball’ and change their site. There are big discussions about what format to use, etc. But Google’s withdrawal from these debates suggest that it’s working on top-down semantics and doesn’t need to weigh in on what people do to their sites.

  1. Google knows that humans are frankly crap at describing and organising things. That’s why Google search worked in the first place, and human-edited directories (like DMOZ, which I once was an editor for, or early-days Yahoo. It went out and found pages, and decided their relative importance, so humans don’t have to. Likewise, with Gmail, it pioneered the folder-less email service – you just search for the email you need, you don’t sort it into folders each time you want it
  2. For all this talk of Web3.0, Google is actually quite far down the road with understanding the closeness of a website’s content to what you searched for, and discarding irrelevant results. It doesn’t have to change a THING about the Internet, or the way Internet users behave, by incorporating better top-down semantics into it’s search algorithm.    Google.com will still look the same; the only difference is that you will be able to use full sentences when you search, to better describe what you want it to find; e.g. “pages about animals like my goldfish’ (would return results about angel fish, clown fish, etc)
  3. If Google encourages bottom-up, it means each website does the heavy lifting; and any jackass coder can build a tool to leverage that, without too much difficulty. But with top-down, Google retains scarcity/monopoly power, because nobody (except Microsoft) can match the manpower needed to build that kind of IP. Top-down semantics are a technical challenge for Google. But bottom-up semantics would challenge Google’s business. It has the workforce to deal with technical challenges better than anyone. But marketplace evolution? Trickier.

If you take it as given that Google will succeed at whichever semantic approach it chooses, and you accept my reasoning that it can only opt for top-down semantics, and you accept that Google is a major Internet trendsetter (e.g. what Gmail did for inbox storage allowances), you reach the following plausible conclusion:

Google will settle the semantic web debate once and for all, kill bottom-up initiatives dead in the water, and build a top-down semantic web search engine that will cement the big G’s position as a market leader in web search.

That’s a warning to investors and coders who are interested in any bottom up (and even to an extent, top-down) semantic web startup. And if it settles the debate, perhaps man hours won’t be wasted on the wrong approach to organising information on the web. Far better the Dapper approach.

(1) 16,805 total employees (source: http://www.google.com/press/pressrel/revenues_q407.html) times “We’re so serious about improving search that more than a third of our people are working on it” (http://graemethickins.typepad.com/graeme_blogs_here/2008/03/googles-annual.html)

Bookmark/Share:
Social List RssSocial List Bookmarking Widget

Related:

Is Google using your brain as you browse?
I just stumbled across a research paper published by a Google employee and a Microsoft employee entitled “A Case for Usage Tracking to Relate Digital Objects“. I have no idea who Elin Rønby Pedersen is but she’s published both on this and on Google’s much vaunted foray into organising your health data. The paper highlights an interesting idea, potentially just as important to Future Google as Pagerank has been to Google so far. It’s not groundbreaking – you see it on, for example, Amazon. But it’s worth thinking about, applied to the whole web. The idea is that related objects – and I use the term extremely loosely here – can be identified because you looked at them during a session of Internet browsing; you started with one, and your later browsing takes you to related objects – blog posts or news articles on the same or related subject; similar…
Google Friend Connect – part 2: The largest Social Network ever built
Having originally assumed that the reason Facebook, Hi5 and LinkedIn (FHL), amongst others, were involved in the Google Friend Connect (GFC) service, I initially wanted to write this post to argue that this was the biggest strategic mistake of their lives. Turns out, Google is involving them whether they like it or not – using their APIs to let you pull in your friend data to your Google Friend Connect profile from your other social networks. In light of this, the point I’ll argue is therefore that not slamming the door on GFC’s scraping of their data would be a fatal mistake for FHL. Needless to say, deprived of their data, GFC loses all its value to users – so this is a zero-sum game. I argued yesterday that all FHL could possibly gain from this is more information about you as you browse around the web and use…
Google Friend Connect – part I: it’s about the data
This week, Google announced a new tool to help me and all other website owners create social  features in our sites. It’s a library of javascript gadgets that I link to (in the Google library) from my site, and loads up in the site (imagine it instead of the Disqus comments system I currently have installed) to add features for visitors which they can use by signing in – like comments, a chatroom, a photo gallery for people to upload photos to, product reviews, whatever. Blogopunditry and civil rights hippies are pleased that you can log in with a google account, or OpenID, AIM, Yahoo, maybe others in future – so this isn’t a straight-up move to get people to sign up Google Accounts. No, it’s far more clever than that. According to their demo video, once you have a Google Friend Connect (GFC) account (having logged in with yahoo,…

Related posts brought to you by Yet Another Related Posts Plugin.

Read Full Post »

« Newer Posts - Older Posts »

%d bloggers like this: