Author: Marc Fawzi
License: Attribution-NonCommercial-ShareAlike 3.0
Announcements:
Semantic Web Developers:
Feb 5, ‘07: The following external reference concerns the use of rule-based inference engines and ontologies in implementing the Semantic Web + AI vision (aka Web 3.0):
- Description Logic Programs: Combining Logic Programs with Description Logic (note: there are better, simpler ways of achieving the same purpose.)
Click here for more info and a list of related articles…
Forward
Two years after I published this article it has received over 200,000 hits and we now have several startups attempting to apply Semantic Web technology to Wikipedia and knowledge wikis in general, including Wikipedia founder’s own commercial startup as well as a startup that was recently purchased by Microsoft.
Recently, after seeing how Wikipedia’s governance is so flawed, I decided to write about a way to decentralize and democratize Wikipedia.
Versión española
Article
(Article was last updated at 10:15am EST, July 3, 2006)
Wikipedia 3.0: The End of Google?
The Semantic Web (or Web 3.0) promises to “organize the world’s information” in a dramatically more logical way than Google can ever achieve with their current engine design. This is specially true from the point of view of machine comprehension as opposed to human comprehension.The Semantic Web requires the use of a declarative ontological language like OWL to produce domain-specific ontologies that machines can use to reason about information and make new conclusions, not simply match keywords.
However, the Semantic Web, which is still in a development phase where researchers are trying to define the best and most usable design models, would require the participation of thousands of knowledgeable people over time to produce those domain-specific ontologies necessary for its functioning.
Machines (or machine-based reasoning, aka AI software or ‘info agents’) would then be able to use those laboriously –but not entirely manually– constructed ontologies to build a view (or formal model) of how the individual terms within the information relate to each other. Those relationships can be thought of as the axioms (assumed starting truths), which together with the rules governing the inference process both enable as well as constrain the interpretation (and well-formed use) of those terms by the info agents to reason new conclusions based on existing information, i.e. to think. In other words, theorems (formal deductive propositions that are provable based on the axioms and the rules of inference) may be generated by the software, thus allowing formal deductive reasoning at the machine level. And given that an ontology, as described here, is a statement of Logic Theory, two or more independent info agents processing the same domain-specific ontology will be able to collaborate and deduce an answer to a query, without being driven by the same software.
Thus, and as stated, in the Semantic Web individual machine-based agents (or a collaborating group of agents) will be able to understand and use information by translating concepts and deducing new information rather than just matching keywords.
Once machines can understand and use information, using a standard ontology language, the world will never be the same. It will be possible to have an info agent (or many info agents) among your virtual AI-enhanced workforce each having access to different domain specific comprehension space and all communicating with each other to build a collective consciousness.
You’ll be able to ask your info agent or agents to find you the nearest restaurant that serves Italian cuisine, even if the restaurant nearest you advertises itself as a Pizza joint as opposed to an Italian restaurant. But that is just a very simple example of the deductive reasoning machines will be able to perform on information they have.
Far more awesome implications can be seen when you consider that every area of human knowledge will be automatically within the comprehension space of your info agents. That is because each info agent can communicate with other info agents who are specialized in different domains of knowledge to produce a collective consciousness (using the Borg metaphor) that encompasses all human knowledge. The collective “mind” of those agents-as-the-Borg will be the Ultimate Answer Machine, easily displacing Google from this position, which it does not truly fulfill.
The problem with the Semantic Web, besides that researchers are still debating which design and implementation of the ontology language model (and associated technologies) is the best and most usable, is that it would take thousands or tens of thousands of knowledgeable people many years to boil down human knowledge to domain specific ontologies.
However, if we were at some point to take the Wikipedia community and give them the right tools and standards to work with (whether existing or to be developed in the future), which would make it possible for reasonably skilled individuals to help reduce human knowledge to domain-specific ontologies, then that time can be shortened to just a few years, and possibly to as little as two years.
The emergence of a Wikipedia 3.0 (as in Web 3.0, aka Semantic Web) that is built on the Semantic Web model will herald the end of Google as the Ultimate Answer Machine. It will be replaced with “WikiMind” which will not be a mere search engine like Google is but a true Global Brain: a powerful pan-domain inference engine, with a vast set of ontologies (a la Wikipedia 3.0) covering all domains of human knowledge, that can reason and deduce answers instead of just throwing raw information at you using the outdated concept of a search engine.
Notes
After writing the original post I found out that a modified version of the Wikipedia application, known as “Semantic” MediaWiki has already been used to implement ontologies. The name that they’ve chosen is Ontoworld. I think WikiMind would have been a cooler name, but I like ontoworld, too, as in “it descended onto the world,” since that may be seen as a reference to the global mind a Semantic-Web-enabled version of Wikipedia could lead to.
Google’s search engine technology, which provides almost all of their revenue, could be made obsolete in the near future. That is unless they have access to Ontoworld or some such pan-domain semantic knowledge repository such that they tap into their ontologies and add inference capability to Google search to build formal deductive intelligence into Google.
But so can Ask.com and MSN and Yahoo…
I would really love to see more competition in this arena, not to see Google or any one company establish a huge lead over others.
The question, to rephrase in Churchillian terms, is wether the combination of the Semantic Web and Wikipedia signals the beginning of the end for Google or the end of the beginning. Obviously, with tens of billions of dollars at stake in investors’ money, I would think that it is the latter. No one wants to see Google fail. There’s too much vested interest. However, I do want to see somebody out maneuver them (which can be done in my opinion.)
Clarification
Please note that Ontoworld, which currently implements the ontologies, is based on the “Wikipedia” application (also known as MediaWiki), but it is not the same as Wikipedia.org.
Likewise, I expect Wikipedia.org will use their volunteer workforce to reduce the sum of human knowledge that has been entered into their database to domain-specific ontologies for the Semantic Web (aka Web 3.0) Hence, “Wikipedia 3.0.”
Response to Readers’ Comments
The argument I’ve made here is that Wikipedia has the volunteer resources to produce the needed Semantic Web ontologies for the domains of knowledge that it currently covers, while Google does not have those volunteer resources, which will make it reliant on Wikipedia.
Those ontologies together with all the information on the Web, can be accessed by Google and others but Wikipedia will be in charge of the ontologies for the large set of knowledge domains they currently cover, and that is where I see the power shift.
Google and other companies do not have the resources in man power (i.e. the thousands of volunteers Wikipedia has) who would help create those ontologies for the large set of knowledge domains that Wikipedia covers. Wikipedia does, and is positioned to do that better and more effectively than anyone else. Its hard to see how Google would be able create the ontologies for all domains of human knowledge (which are continuously growing in size and number) given how much work that would require. Wikipedia can cover more ground faster with their massive, dedicated force of knowledgeable volunteers.
I believe that the party that will control the creation of the ontologies (i.e. Wikipedia) for the largest number of domains of human knowledge, and not the organization that simply accesses those ontologies (i.e. Google), will have a competitive advantage.
There are many knowledge domains that Wikipedia does not cover. Google will have the edge there but only if people and organizations that produce the information also produce the ontologies on their own, so that Google can access them from its future Semantic Web engine. My belief is that it would happen but very slowly, and that Wikipedia can have the ontologies done for all the domain of knowledge that it currently covers much faster, and then they would have leverage by the fact that they would be in charge of those ontologies (aka the basic layer for AI enablement.)
It still remains unclear, of course, whether the combination of Wikipedia and the Semantic Web herald the beginning of the end for Google or the end of the beginning. As I said in the original part of the post, I believe that it is the latter, and the question I pose in the title of this post, in this context, is not more than rhetorical. However, I could be wrong in my judgment and Google could fall behind Wikipedia as the world’s ultimate answer machine.
After all, Wikipedia makes “us” count. Google doesn’t. Wikipedia derives its power from “us.” Google derives its power from its technology and inflated stock price. Who would you count on to change the world?
Response to Basic Questions Raised by the Readers
Reader divotdave asked a few questions, which I thought to be very basic in nature (i.e. important.) I believe more people will be pondering about the same issues, so I’m to including here them with the replies.
Question:
How does it distinguish between good information and bad? How does it determine which parts of the sum of human knowledge to accept and which to reject?
Reply:
It wouldn’t have to distinguish between good vs bad information (not to be confused with well-formed vs badly formed) if it was to use a reliable source of information (with associated, reliable ontologies.) That is if the information or knowledge to be sought can be derived from Wikipedia 3.0 then it assumes that the information is reliable.
However, with respect to connecting the dots when it comes to returning information or deducing answers from the sea of information that lies beyond Wikipedia then your question becomes very relevant. How would it distinguish good information from bad information so that it can produce good knowledge (aka comprehended information, aka new information produced through deductive reasoning based on exiting information.)
Question:
Who, or what as the case may be, will determine what information is irrelevant to me as the inquiring end user?
Reply:
That is a good question and one which would have to be answered by the researchers working on AI engines for Web 3.0
There will be assumptions made as to what you are inquiring about. Just as when I saw your question I had to make assumption about what you really meant to ask me, AI engines would have to make an assumption, pretty much based on the same cognitive process humans use, which is the topic of a separate post, but which has been covered by many AI researchers.
Question:
Is this to say that ultimately some over-arching standard will emerge that all humanity will be forced (by lack of alternative information) to conform to?
Reply:
There is no need for one standard, except when it comes to the language the ontologies are written in (e.g OWL, OWL-DL, OWL Full etc.) Semantic Web researchers are trying to determine the best and most usable choice, taking into consideration human and machine performance in constructing –and exclusive in the latter case– interpreting those ontologies.
Two or more info agents working with the same domain-specific ontology but having different software (different AI engines) can collaborate with each other.
The only standard required is that of the ontology language and associated production tools.
Addendum
On AI and Natural Language Processing
I believe that the first generation of AI that will be used by Web 3.0 (aka Semantic Web) will be based on relatively simple inference engines that will NOT attempt to perform natural language processing, where current approaches still face too many serious challenges. However, they will still have the formal deductive reasoning capabilities described earlier in this article, and users would interact with these systems through some query language.
On the Debate about the Nature and Definition of AI
The embedding of AI into cyberspace will be done at first with relatively simple inference engines (that use algorithms and heuristics) that work collaboratively in P2P fashion and use standardized ontologies. The massively parallel interactions between the hundreds of millions of AI Agents that will run within the millions of P2P AI Engines on users’ PCs will give rise to the very complex behavior that is the future global brain.
Related:
- Web 3.0 Update
- All About Web 3.0 <– list of all Web 3.0 articles on this site
- P2P 3.0: The People’s Google
- Reality as a Service (RaaS): The Case for GWorld <– 3D Web + Semantic Web + AI
- For Great Justice, Take Off Every Digg
- Google vs Web 3.0
- People-Hosted “P2P” Version of Wikipedia
- Beyond Google: The Road to a P2P Economy
Update on how the Wikipedia 3.0 vision is spreading:
Update on how Google is co-opting the Wikipedia 3.0 vision:
Web 3D Fans:
Here is the original Web 3D + Semantic Web + AI article:
The above mentioned Web 3D + Semantic Web + AI vision which preceded the Wikipedia 3.0 vision received much less attention because it was not presented in a controversial manner. This fact was noted as the biggest flaw of social bookmarking site digg which was used to promote this article.
Web 3.0 Developers:
Feb 5, ‘07: The following external reference concerns the use of rule-based inference engines and ontologies in implementing the Semantic Web + AI vision (aka Web 3.0):
- Description Logic Programs: Combining Logic Programs with Description Logic (note: there are better, simpler ways of achieving the same purpose.)
Jan 7, ‘07: The following Evolving Trends post discusses the current state of semantic search engines and ways to improve the paradigm:
- Designing a Better Web 3.0 Search Engine
June 27, ‘06: Semantic MediaWiki project, enabling the insertion of semantic annotations (or metadata) into the content:
- http://semantic-mediawiki.org/wiki/Semantic_MediaWiki (see note on Wikia below)
Wikipedia’s Founder and Web 3.0
(more…)