Archive for January 14th, 2008

« A Bottle That Purifies Enough Water for a Year | Main | Cool Women in Tech »

September 18, 2007

The Semantic Web, Collective Intelligence and Hyperdata

I’m posting this in response to a recent post by Tim O’Reilly which focused on disambiguating what the Semantic Web is and is not, as well as the subject of Collective Intelligence. I generally agree with Tim’s post, but I do have some points I would add by way of clarification. In particular, in my opinion,  the Semantic Web is all about collective intelligence, on several levels. I would also suggest that the term “hyperdata” is a possibly useful way to express what the Semantic Web is really all about.What Makes Something a Semantic Web Application?I agree with Tim that the term “Semantic Web” refers to the use of a particular set of emerging W3C open standards. These standards include RDF, OWL, SPARQL, and GRDDL. A key requirement for an application to have “Semantic Web inside” so to speak, is that it makes use of or is compatible with, at the very least, basic RDF. Another alternative definition is that for an application to be “Semantic Web” it must make at least some use of an ontology, using a W3C standard for doing so.

Semantic Versus Semantic Web

Many applications and services claim to be “semantic” in one manner or another, but that does not mean they are “Semantic Web.” Semantic applications include any applications that can make sense of meaning, particularly in language such as unstructured text, or structured data in some cases. By this definition, all search engines today are somewhat “semantic” but few would qualify as “Semantic Web” apps.

The Difference Between “Data On the Web” and a “Web of Data”

The Semantic Web is principally about working with  data in a new and hopefully better way, and making that data available on the Web if desired in an open fashion such that other applications can understand and reuse it more easily. We call this idea “The Data Web” — the notion is that we are transforming the Web from a distributed file server into something that is more like a distributed database.

Instead of the basic objects being web pages, they are actually pieces of data (triples) and records formed from them (sets, trees, graphs or objects comprised of triples). There can be any number of triples within a Web page, and there can also be triples on the Web that do not exist within Web pages at all — they can come directly from databases for example.

One might respond to this by noting that there is already a lot of data on the Web, in XML and other formats — how is the Semantic Web different from that? What is the difference between “Data on the Web” and the idea of “The Data Web?”

The best answer to this question that I have heard was something that Dean Allemang said at a recent Semantic Web SIG in Palo Alto. Dean said, “Sure there is data on the Web, but it’s not actually a web of data.”  The difference is that in the Semantic Web paradigm, the data can be linked to other data in other places, it’s a web of data, not just data on the Web.

I call this concept of interconnected data, “Hyperdata.” It does for data what hypertext did for text. I’m probably not the originator of this term, but I think it is a very useful term and analogy for explaining the value of the Semantic Web.

Another way to think of it is that the current Web is a big graph of interconnected nodes, where the nodes are usually HTML documents, but in the Semantic Web we are talking about a graph of interconnected data statements that can be as general or specific as you want. A data record is a set of data statements about the same subject, and they don’t have to live in one place on the network — they could be spread over many locations around the Web.

A statement to the effect of “Sue lives in Palo Alto” could exist on site A, refer to a URI for a statement defining Sue on site B, a URI for a statement that defines “lives in” on site C, and a URI for a statement defining “Palo Alto” on site D. That’s a web of data. What’s cool is that anyone can potentially add statements to this web of data, it can be completely emergent.

The Semantic Web is Built by and for Collective Intelligence

This is where I think Tim and others who think about the Semantic Web may be missing an essential point. The Semantic Web is in fact highly conducive to “collective intelligence.” It doesn’t require that machines add all the statements using fancy AI. In fact, in a next-generation folksonomy, when tags are created by human users, manually, they can easily be encoded as RDF statements. And by doing this you get lots of new capabilities, like being able to link tags to concepts that define their meaning, and to other related tags.

Humans can add tags that become semantic web content. They can do this manually or software can help them. Humans can also fill out forms that generate RDF behind the scenes, just as filling out a blog posting form generates HTML, XML, ATOM etc. Humans don’t actually write all that code, software does it for them, yet blogging and wikis for example are considered to be collective intelligence tools.

So the concept of folksonomy and tagging is truly orthogonal to the Semantic Web. They are not mutually exclusive at all. In fact the Semantic Web — or at least “Semantic Web Lite” (RDF + only basic use of OWL + basic SPARQL) is capable of modelling and publishing any data in the world in a more open way.

Any application that uses data could do everything it does using these technologies. Every single form of social, user-generated content and community could, and probably will, be implemented using RDF in one manner or another within the next decade or so. And in particular, RDF and OWL + SPARQL are ideal for social networking services — the data model is a much better match for the structure of the data and the network of users and the kinds of queries that need to be done.


This notion that somehow the Semantic Web is not about folksonomy needs to be corrected. For example, take Metaweb’s Freebase. Freebase is what I call a “folktology” — it’s an emergent, community generated ontology. Users collaborate to add to the ontology and the knowledge base that is populated within it. That’s a wonderful example of collective intelligence, user generated content, and semantics (although technically to my knowledge they are not using RDF for this, their data model is from what I can see functionally equivalent and I would expect at least a SPARQL interface from them eventually).

But that’s not all — check out TagCommons and this Tag Ontology discussion, and also the SKOS ontology — all of which are working on semantic ways of characterizing simple tags in order to enrich folksonomies and enable better collective intelligence.

There are at least two other places where the Semantic Web naturally leverages and supports collective intelligence. The first is the fact that people and software can generate triples (people could do it by hand, but generally they will do it by filling out Web forms or answering questions or dialog boxes etc.) and these triples can live all over the Web, yet interconnect or intersect (when they are about the same subjects or objects).

I can create data about a piece of data you created, for example to state that I agree with it, or that I know something else about it. You can create data about my data. Thus a data-set can be generated in a distributed way — it’s not unlike a wiki for example. It doesn’t have to work this way, but at least it can if people do this.

The second point is that OWL, the ontology language, is designed to support an infinite number of ontologies — there doesn’t have to be just one big ontology to “rule them all.” Anyone can make a simple or complex ontology and start to then make data statements that refer to it. Ontologies can link to or include other ontologies, or pieces of them, to create bigger distributed ontologies that cover more things.

This is kind of like not only mashing up the data, but also mashing up the schemas too. Both of these are examples of collective intelligence. In the case of ontologies, this is already happening, for example many ontologies already make use of other ontologies like the Dublin Core and Foaf.

The point here is that there is in fact a natural and very beneficial fit between the technologies of the Semantic Web and what Tim O’Reilly defines Web 2.0 to be about (essentially collective intelligence). In fact the designers of the underlying standards of the Semantic Web specifically had “collective intelligence” in mind when they came up with these ideas. They were specifically trying to rectify several problems in the closed, data-silo world of old fashioned databases. The big motivation was to make data more integrated, to enable applications to share data more easily, and to be able to build data with other data, and to build schemas with other schemas. It’s all about enabling connections and network effects.

Now, whether people end up using these technologies to do interesting things that enable human-level collective intelligence (as opposed to just software level collective intelligence) is an open question. At least some companies such as my own Radar Networks and Metaweb, and Talis (thanks, Danny), are directly focused on this, and I think it is safe to say this will be a big emerging trend. RDF is a great fit for social and folksonomy-based applications.

Web 3.0 and the concept of “Hyperdata”

Where Tim defines Web 2.0 as being about collective intelligence generally, I would define Web 3.0 as being about “connective intelligence.” It’s about connecting data, concepts, applications and ultimately people. The real essence of what makes the Web great is that it enables a global hypertext medium in which collective intelligence can emerge. In the case of Web 3.0, which begins with the Data Web and will evolve into the full-blown Semantic Web over a decade or more, the key is that it enables a global hyperdata medium (not just hypertext).

As I mentioned above, hyperdata is to data what hypertext is to text. Hyperdata is a great word — it is so simple and yet makes a big point. It’s about data that links to other data. It does for data what hypertext does for text. That’s what RDF and the Semantic Web are really all about. Reasoning is NOT the main point (but is a nice future side-effect…). The main point is about growing a web of data.

Just as the Web enabled a huge outpouring of collective intelligence via an open global hypertext medium, the Semantic Web is going to enable a similarly huge outpouring of collective knowledge and cognition via a global hyperdata medium. It’s the Web, only better.


TrackBack URL for this entry:
http://www.typepad.com/t/trackback/2271/21723941Listed below are links to weblogs that reference The Semantic Web, Collective Intelligence and Hyperdata:


Nova:This is helpful. Similar;y, I’m wondering what your thoughts are about Pierre Levy’s recent papers:Opening the Semantic Space in the Service of Collective Intelligence: http://www.reciis.cict.fiocruz.br/index.php/reciis/article/viewPDFInterstitial/43/38

Elements of Semantic Engineering: http://www.ieml.org/text/semantic_space.pdf

Thanks for that little addition Nova – you made my CEO and CTO very happy 😉

Another great post:
– Web 2.0 (in fact still 1.0) vs Web 3.0 (The true 2.0)
– Hypertext VS Hyperdata
– Data on the web VS The dataweb

Read Full Post »

What’s next for the Internet

Nova Spivack is racing to bring meaning and order to the chaos of the Internet. And he’s not alone. Business 2.0 reports.

Business 2.0 Magazine
By Michael V. Copeland, Business 2.0 Magazine senior writer

(Business 2.0 Magazine) — After taking one of the first Internet companies — EarthWeb — public in 1998, Nova Spivack joined some friends at a weedy airstrip deep inside the new Russia for a trip into Earth’s stratosphere.Having space travel on your resume is de rigueur for Internet entrepreneurs these days, but this was 1999, and not even the Russian pilots were sure how the flight would turn out. As Spivack was being strapped into a MiG-25 and prepped for his trip at Mach 3, about 20 miles straight up, he looked around for an ejection button or lever in case things went south.

To build a smarter Web, Nova Spivack is finding ways for machines to do more of the work.

John Giannandrea, Danny Hillis, and Robert Cook founded Metaweb Technologies with the goal of building a semantic Web structure — think of it as a semantic Wikipedia — for all the world’s knowledge.

Video More video

Business 2.0’s Erick Schonfeld talks with a group of disruptive industry leaders about the relatively unrefined nature of Google’s search feature.
Play video

There wasn’t one. “‘Don’t worry about eet,’ the pilot told me,” says Spivack, mustering his best Russian accent. “At the speed you will be going, even if you could eject, first your body would explode into vapor, then the vapor would freeze into ice crystals, and then the crystals would burn up on reentry.”

With that, they taxied down the runway for a quick ride to the edge of space.

Spivack returned in one piece ready to launch more startups, but the image of his body exploded into ice crystals and skittering into the stratosphere never left him. And in fact, it’s not a bad metaphor — in reverse — for what his newest venture is trying to do.

If you think of the World Wide Web as a cloud of largely undifferentiated information, the mission of the company he’s about to unveil, Radar Networks, is to take that cloud and impose order on it. Not just any order, but a very special kind known to experts by one of the hottest buzzwords in computer science today: the semantic Web.

For all the wonders that today’s Web can deliver to your fingertips — the Norwegian word for ice cream, a seat on the next flight to Paris, the best price for a Clash CD — it has a fundamental flaw.

It’s basically a compendium of billions of text documents designed to be read by humans. You can search it for keywords, but the results aren’t much use until you sort through them to find the page that has the info you want.

To take the Web to the next level — to move from Web 2.0 to Web 3.0 — the information in those documents will have to be turned into data that a machine can read and evaluate on its own. Only then will computers be able to take over tasks we now do by hand: find the nearest restaurant, book the best flight, buy the cheapest CD.

Think of it as the difference between two dimensions and three dimensions. “People will see the Web start to become smarter,” Spivack says. “Eventually it will have some reasoning capabilities built into it.”

We’ll get to how that happens in a bit. For Spivack, however, the semantic Web begins now with the data engine and user applications he and his team are prepping for launch — and ends somewhere in the future with artificially intelligent software agents handling all the online drudgery of your business and professional life.

Radar Networks isn’t the only company exploring the potential of the semantic Web. It’s a disruptive technology with the power to unseat today’s Internet titans — especially search engine giants like Google (Charts, Fortune 500) and Yahoo (Charts, Fortune 500) — and it’s being vigorously pursued by startups like Garlik, Metaweb Technologies, Powerset, and ZoomInfo, as well as big corporations like Citigroup (Charts, Fortune 500), Eli Lilly, Kodak (Charts, Fortune 500), Oracle (Charts, Fortune 500), and Google and Yahoo themselves.

One estimate pegs the market for products and services stemming from semantic Web technologies at $50 billion by 2010, up from about $7 billion today.

But for all the entrepreneurs ready to spin gold out of the semantic Web, there are as many skeptics convinced that it’s a pipe dream — a fancy name for a problem that will never be fully solved. Spivack, with the confidence of a man who has been to space without a safety net, is determined to prove them wrong.

Radar Networks is housed in a renovated warehouse not far from the ballpark where the San Francisco Giants play. Inside, massive redwood timbers span the high ceilings alongside thick clusters of data cable. A Nintendo Wii and a shiny new De’Longhi espresso machine are the only outward signs of anything being done here but mind-bending work.

There are 20 people at the company now, but there’s space for 50, and with just a bit less than $10 million in venture funding, Spivack and his senior executives are busy hiring.

The background of the Radar team includes deep expertise in statistics, bioinformatics, and artificial intelligence. Radar’s chief architect, Jim Wissner, is a Java ace. Chris Jones, director of products and operations, is a design and user interface whiz. CTO Lew Tucker got his start by mapping neural transmitters in the brain. Tucker and Spivack go back to the late ’80s, when they both worked for Danny Hillis at Thinking Machines.

Given all the firepower assembled at Radar Networks, you get the sense that this is not your typical Web startup. And it’s not. The task the company has set for itself — bringing the power of the semantic Web to the Internet — is not easy to describe. Even the man who invented the Web, Tim Berners-Lee, needs a little room to explain why it’s important.

The term “semantic Web” first gained prominence in a 2001 article by Berners-Lee and two coauthors, James Hendler and Ora Lassila, in Scientific American. In it they described software agents roaming across the Web, making travel arrangements and doctor’s appointments and muting the stereo when the telephone rings.

It was a great vision, but it couldn’t be achieved with today’s Internet.

For the semantic Web to work, online information needs to be made readable by machines. Services like Google do a great job of sifting through all those webpages, but it’s up to people to recognize the things they want when they see them in the results. It’s also up to people to combine information to, say, plan a long-overdue ski trip.

The Web just isn’t very smart yet; one webpage is the same as any other. It might have a higher Google ranking, but there’s no distinction based on meaning.

The semantic Web in the Berners-Lee vision acts more like a series of connected databases, where all information resides in a structured form. Within that structure is a layer of description that adds meaning that the computer can understand. (“Semantics” is the branch of linguistics concerned with meaning.)

On the semantic Web, a person — Nova Spivack, for example — isn’t just a name that comes up on webpages when you google him. He’s a fully described object endowed with certain well-defined properties: a date of birth, a job title, a home address, specific hobbies, the fact that he is the grandson of legendary management thinker Peter Drucker.

People on the semantic Web have unambiguous connections to the places they work, the people they’re related to, their friends, their calendars, and the things they’re interested in. Being able to connect those properties in seen and unseen ways is what gives the semantic Web its power.

Consider this scenario: Say you want to arrange a dinner at an upcoming conference. Today you might go through your address book and ping folks by e-mail to see who’s attending. Then you probably send out e-mail invitations to dinner. You go back and forth with the group on the place and time, somehow you all agree, and then somebody makes a reservation. Files fly back and forth, with humans at the center.

In the semantic Web, your software agent will “know” in advance what’s involved in arranging a dinner. Instead of you sending out a flurry of e-mails, the agent could cull the conference attendees and make a list of potential invitees.

It might also look through your address book to see which of your friends live in the city where the conference is being held. Once a list of potential dinner guests has been approved by you, the agent would negotiate the date and time with everyone else’s agents via a calendar database, pick a restaurant from another database based on availability and your personal preferences, make the reservation, and send out directions. In a GPS-enabled world, it could even let you know how far a guest who is running late has to go.

Of course, it’s been six years since Berners-Lee put his vision out there, and you still can’t get that sort of service. Tagging is a start, and services like Flickr offer a sort of crude Web 2.0 version of the semantic Web. Google Base is another stab at bringing semantic technologies to the wider Web, serving as a place where anyone can enter data and have it searched, but it doesn’t use the semantic approach from start to finish.

Bringing a true semantic Web to the world is a chicken-and-egg problem. Until there’s enough data rendered in computer-readable form (resource description framework, or RDF, is the leading standard) with enough metadata attached to it to make it meaningful, nobody is going to be able to create any interesting services.

The agents of the semantic Web need the raw ingredients before they can make their soufflés.

But you can do some interesting things within subsets of the Web. Large pharmaceutical companies like Eli Lilly (Charts, Fortune 500) have been experimenting with adding a semantic layer on top of their drug discovery databases to help scientists see connections between drug molecules and diseases.

Amazon.com (Charts, Fortune 500) is keen on using semantic technologies to help customers search its databases. Kodak wants semantic tagging to help photographers organize their snapshots online. The CIA has been loading its databases of overseas phone taps into semantic “mills” to make it easier to sift for connections between people, places, and incidents — hoping to spot terror threats before it’s too late.

“But how do you make this thing really useful for ordinary people?” asks Radar CTO Tucker. “Not everyone is a CIA analyst.”

Spivack’s answer grew out of conversations he had with Drucker in the summer of 2001, about four years before the professor’s passing. “We would meet for two hours a day and talk about organizations vs. organisms,” Spivack says. Drucker was particularly interested in what he called the intelligence of organizations. “My grandfather helped me think about group minds,” Spivack says. “How groups get more intelligent, and how connections play into that.”

Since bringing the semantic Web into the world is a chicken-and-egg proposition, Radar Networks has built both the chicken and the egg. The chicken is the underlying engine the team has created that not only turns data into simple but meaningful digital objects via RDF but also scales up to hold hundreds of millions of objects that can be searched, swapped, and connected to one another. The egg is the user application that rides on top of it all.

The first consumer app Radar plans to launch is a sort of personal data organizer. It will allow you to bring in e-mail, contacts, photos, video, music –anything digital, really — from anywhere on the Web, turn it into RDF, and access it in one place.

Semantic tags are added manually, or automatically if the item is a photo from Flickr or a video from YouTube. “We add a new level of order to connect and interact with these things at a higher level than is possible today,” Spivack says. “We are letting you build a little semantic Web for your project, your group, or your interest.”

When it’s done, it should be like the best wiki you’ve ever used. To illustrate, Spivack flips open his computer and pulls up his own Radar-enabled page. On it are groups of people he knows and interests he’s pursuing, including the space industry, alternative energy, physics, Internet-related technology, and skiing. In each of these categories are objects that Spivack has collected and tagged or, if it is a topic that has multiple people included, that they have collected and tagged.

In the skiing topic, for example, Spivack has posed a question: Where should we go skiing? One of the responses is Alta, Utah. When Spivack clicks on that item, the Radar engine goes out and finds all the things in the Radar Networks database related to Alta. It “knows” that Alta, in this case, refers to a place (as opposed to the Spanish word for “high”), so there are hotel suggestions. There are also photos, videos, trail maps, and comments from people in his group who have skied there before.

In a sense, what Radar allows Spivack to do is build a database around any question, project, or interest he may have and then start looking at it from different perspectives: cost, distance from San Francisco, snow conditions in March, nearby restaurants, what his friends liked about a particular resort.

And if they liked Alta, what other places did they like? “You start to see new ways to look at the information,” Spivack says. “What gets me excited is what we can do when we have billions of objects and 10 million people using them.”

For that to happen, of course, people need to start adding their own digital stuff to the mix. The digital life organizer is the bait Spivack and his team are using to try to draw them in. The team will also open Radar Networks to outside developers to write their own applications. Those might involve travel, food, or a better way to manage large projects.

Radar hopes to be the engine powering all that, providing a massive, meaning-filled Web of data that can be infinitely poked and prodded and leveraged. The company will make its money from advertising and premium subscriptions; the basic service will be free.

But don’t expect a sci-fi software agent that takes care of your every whim — Spivack is quick to say that’s not what Radar is launching. “Those people who think we will be offering Hal 9000 when this goes public in October will be disappointed,” he says. “We’ve had the problem of overpromising in this industry; a lot of us who were working on semantic Web technologies early on saw the potential and got a little excited. It has taken much longer to realize than we thought. One thing Web 2.0 has taught everybody is that simpler is better. Find something useful and iterate on that.”

Tom Coates, whose day job at Yahoo involves working on just these issues, thinks the Web 2.0 crowd is already taking care of the problem. He points to tagging and microformats that add some of the same metadata to webpages that semantic technologies offer.

“I call it the dirty semantic Web,” Coates says from his London office. “It may not be the pristine Berners-Lee view of the world, but it is headed in the right direction.”

On a lark, Coates and a colleague created a site called Astronewsology that demonstrates the power of a semantic approach by combining news reports and horoscopes. Using it, you can search the news by the sign Capricorn and see whether that day’s horoscope had any bearing on what happened to people born between Dec. 22 and Jan. 20. Coates’s point is that you can extract meaning from the data without adopting the exacting standards proposed by Berners-Lee.

Things get even more interesting when the data starts to become interconnected.

“It’s in the combination that the real power of this comes out,” Coates says. “The mashup is an early example of the Web that is to come. Semantic technologies have not taken off as much as we’d hoped because people are finding more utility in other Web 2.0 technologies at the moment. The goal is the most important thing: reusable, repurposable, and reconnectable data. How we get there is not as important.”

The shift to a semantic Web is still in its very early days. Spivack envisions a time line of five to seven years. But the shift is clearly under way. James Hendler, one of the coauthors of that seminal Scientific American article, sees the same dynamics he did when the Web was first forming.

“Those of us who were involved saw little islands of the Web being created,” he says. “To most people, the Web seemed to happen overnight, because they hadn’t seen the first six to eight years of effort. We’re in that early phase of the semantic Web.”

Radar Networks, Google Base, and even Flickr are the first islands to pop into public view. Larger islands are being formed by corporations and government agencies. Many more will rise.

Spivack is counting on those islands to eventually coalesce. That’s when the potential becomes reality. That’s when we can all kick back and let our software agents go out and bring some order to the chaos of our digital lives.

Michael V. Copeland is a senior writer at Business 2.0. Top of page

To send a letter to the editor about this story, click here.

Read Full Post »

%d bloggers like this: