July 19, 2006
Editing and Addendum by Marc Fawzi
A lot of buzz about Web 3.0 and Wikipedia 3.0 has been generagted lately by Marc Fawzi through this blog, so I’ve decided that for my first post here I’d like to dive into this idea and take a look at how to build a Semantic Content Management System (CMS). I know this blog has had a more of a visionary, psychological and sociological theme (i.e., the vision for the future and the Web’s effect on society, human relationships and the individual himself), but I’d like to show the feasibility of this vision by providing some technical details.
We want a CMS capable of building a knowledge base (that is a set of domain-specific ontologies) with formal deductive reasoning capabilities.
- A semantic CMS framework.
- An ontology API.
- An inference engine.
- A framework for building info-agents.
The general idea would be something like this:
- Users use a semantic CMS like Semantic MediaWiki to enter information as well as semantic annotations (to establish semantic links between concepts in the given domain on top of the content) This typically produces an informal ontology on top of the information, which, when combined with domain inference rules and the query structures (for the particular schema) that are implemented in an independent info agent or built into the CMS, would give us a Domain Knowledge Database. (Alternatively, we can have users enter information into a non-semantic CMS to create content based on a given doctype or content schema and then front-end it with an info agent that works with a formal ontology of the given domain, but we would then need to perform natural language processing, including using statistical semantic models, since we would lose the certainty that would normally be provided by the semantic annotations that, in a Semantic CMS, would break down the natural language in the information to a definite semantic structure.)
- Another set of info agents adds to our knowledge base inferencing-based querying services for information on the Web or other domain-specific databases. User entered information plus information obtained from the web makes up our Global Knowledge Database.
- We provide a Web-based interface for querying the inference engine.
Each doctype or schema (depending on the CMS of your choice) will have a more or less direct correspondence with our ontologies (i.e. one schema or doctype maps with one ontology). The sum of all the content of a particular schema makes up a knowledge-domain which when transformed into a semantic language like (RDF or more specifically OWL) and combined with the domain inference rules and the query structures (for the particular schema) constitute our knowledge database. The choice of CMS is not relevant as long as you can query its contents while being able to define schemas. What is important is the need for an API to access the ontology. Luckily projects like JENA fills this void perfectly providing both an RDF and an OWL API for Java.
In addition, we may want an agent to add or complete our knowledge base using available Web Services (WS). I’ll assume you’re familiarized with WS so I won’t go into details.
Now, the inference engine would seem like a very hard part. It is. But not for lack of existing technology: the W3C already have a recommendation language for querying RDF (viz. a semantic language) known as SPARQL (http://www.w3.org/TR/rdf-sparql-query/) and JENA already has a SPARQL query engine.
The difficulty lies in the construction of ontologies which would have to be formal (i.e. consistent, complete, and thoroughly studied by experts in each knowledge-domain) in order to obtain powerful deductive capabilities (i.e. reasoning).
We already have technology powerful enough to build projects such as this: solid CMS, standards such as RDF, OWL, and SPARQL as well as a stable framework for using them such as JENA. There are also many frameworks for building info-agents but you don’t necessarily need a specialized framework, a general software framework like J2EE is good enough for the tasks described in this post.
In the diagram below, the domain-specific ontologies (OWL 1 … N) could be all built by Wikipedia (see Wikipedia 3.0) since they already have the largest online database of human knowledge and the domain experts among their volunteers to build the ontologies for each domain of human knowledge. One possible way is for Wikipedia will build informal ontologies using Semantic MediaWiki (as Ontoworld is doing for the Semantic Web domain of knowledge) but Wikipedia may wish to wait until they have the ability to build formal ontologies, which would enable more powerful machine-reasoning capabilities.
[Note: The ontologies simply allow machines to reason about information. They are not information but meta-information. They have to be formally consistent and complete for best results as far as machine-based reasoning is concerned.]
However, individuals, teams, organizations and corporations do not have to wait for Wikipedia to build the ontologies. They can start building their own domain-specific ontologies (for their own domains of knowledge) and use Google, Wikipedia, MySpace, etc as sources of information. But as stated in my latest edit to Eric’s post, we would have to use natural language processing in that case, including statistical semantic models, as the information won’t be pre-semanticized (or semantically annotated), which makes the task more dificult (for us and for the machine …)
What was envisioned in the Wikipedia 3.0: The End of Google? article was that since Wikipedia has the volunteer resources and the world’s largest database of human knowledge then it will be in the powerful position of being the developer and maintainer of the ontologies (including the semantic annotations/statements embedded in each page) which will become the foundation for intelligence (and “Intelligent Findability”) in Web 3.0.
This vision is also compatible with the vision for P2P AI (or P2P 3.0), where people will run P2P inference engines on their PCs that communicate and collaborate with each other and that tap into information form Google, Wikipedia, etc, which will ultimately push Google and central search engines down to the commodity layer (eventually making them a utility business just like ISPs.)
- Wikipedia 3.0: The End of Google? June 26, 2006
- Wikipedia 3.0: El fin de Google (traducción) July 12, 2006
- Web 3.0: Basic Concepts June 30, 2006
- P2P 3.0: The People’s Google July 11, 2006
- Why Net Neutrality is Good for Web 3.0 July 15, 2006
- Intelligence (Not Content) is King in Web 3.0 July 17, 2006
- Web 3.0 Blog Application July 18, 2006
- Semantic MediaWiki July 12, 2006
- Get Your DBin July 12, 2006
Enjoyed this analysis? You may share it with others on:
Semantic Web, Web strandards, Trends, OWL, innovation, Startup, Google, GData, inference engine, AI, ontology, Semantic Web, Web 2.0, Web 2.0, Web 3.0, Web 3.0, Google Base, artificial intelligence, AI, Wikipedia, Wikipedia 3.0, Ontoworld, Wikipedia AI, Info Agent, Semantic MediaWiki, DBin, P2P 3.0, P2P AI, AI Matrix, P2P Semantic Web inference Engine, semantic blog, intelligent findability, JENA, SPARQL, RDF, OWL