Posts Tagged ‘RDF’

Semantic Web Patterns: A Guide to Semantic Technologies

Written by Alex Iskold / March 25, 2008 3:20 PM / 32 Comments


In this article, we’ll analyze the trends and technologies that power the Semantic Web. We’ll identify patterns that are beginning to emerge, classify the different trends, and peak into what the future holds.

In a recent interview Tim Berners-Lee pointed out that the infrastructure to power the Semantic Web is already here. ReadWriteWeb’s founder, Richard MacManus, even picked it to be the number one trend in 2008. And rightly so. Not only are the bits of infrastructure now in place, but we are also seeing startups and larger corporations working hard to deliver end user value on top of this sophisticated set of technologies.

The Semantic Web means many things to different people, because there are a lot of pieces to it. To some, the Semantic Web is the web of data, where information is represented in RDF and OWL. Some people replace RDF with Microformats. Others think that the Semantic Web is about web services, while for many it is about artificial intelligence – computer programs solving complex optimization problems that are out of our reach. And business people always redefine the problem in terms of end user value, saying that whatever it is, it needs to have simple and tangible applications for consumers and enterprises.

The disagreement is not accidental, because the technology and concepts are broad. Much is possible and much is to be imagined.

1. Bottom-Up and Top-Down

We have written a lot about the different approaches to the Semantic Web – the classic bottom-up approach and the new top-down one. The bottom-up approach is focused on annotating information in pages, using RDF, so that it is machine readable. The top-down approach is focused on leveraging information in existing web pages, as-is, to derive meaning automatically. Both approaches are making good progress.

A big win for the bottom-up approach was recent announcement from Yahoo! that their search engine is going to support RDF and microformats. This is a win-win-win for publishers, for Yahoo!, and for customers – publishers now have an incentive to annotate information because Yahoo! Search will be taking advantage of it, and users will then see better, more precise results.

Another recent win for the bottom-up approach was the announcement of the Semantify web service from Dapper (previous coverage). This offering will enable publishers to add semantic annotations to existing web pages. The more tools like Semantify that pop up, the easier it will be for publishers to annotate pages. Automatic annotation tools combined with the incentive to annotate the pages is going to make the bottom-up approach more compelling.

But even if the tools and incentive exists, to make the bottom-up approach widespread is difficult. Today, the magic of Google is that it can understand information as is, without asking people to fully comply with W3C standards of SEO optimization techniques. Similarly, top-down semantic tools are focused on dealing with imperfections in existing information. Among them are the natural language processing tools that do entity extraction – such as the Calais and TextWise APIs that recognize people, companies, places, etc. in documents; vertical search engines, like ZoomInfo and Spock, which mine the web for people; technologies like Dapper and BlueOrganizer, which recognize objects in web pages; and Yahoo! Shortcuts, Snap and SmartLinks, which recognize objects in text and links.

[Disclosure:] Alex Iskold is founder and CEO of AdaptiveBlue, which makes BlueOrganizer and SmartLinks.

Top-down technologies are racing forward despite imperfect information. And, of course, they benefit from the bottom-up annotations as well. The more annotations there are, the more precise top-down technologies will get – because they will be able to take advantage of structured information as well.

2. Annotation Technologies: RDF, Microformats, and Meta Headers

Within the bottom-up approach to annotation of data, there are several choices for annotation. They are not equally powerful, and in fact each approach is a tradeoff between simplicity and completeness. The most comprehensive approach is RDF – a powerful, graph-based language for declaring things, and attributes and relationships between things. In a simplistic way, one can think of RDF as the language that allows expressing truths like: Alex IS human (type expression), Alex HAS a brain (attribute expression), and Alex IS the father of Alice, Lilly, and Sofia (relationship expression). RDF is powerful, but because it is highly recursive, precise, and mathematically sound, it is also complex.

At present, most use of RDF is for interoperability. For example, the medical community uses RDF to describe genomic databases. Because the information is normalized, the databases that were previously silos can now be queried together and correlated. In general, in addition to semantic soundness, the major benefit of RDF is interoperability and standardization, particularly for enterprises, as we will discuss below.

Microformats offer a simpler approach by adding semantics to existing HTML documents using specific CSS styles. The metadata is compact and is embedded inside the actual HTML. Popular microformats are hCard, which describes personal and company contact information, hReview, which adds meta information to review pages, and hCalendar, which is used to describe events.

Microformats are gaining popularity because of their simplicity, but they are still quite limiting. There is no way to described type hierarchies, which the classic semantic community would say is critical. The other issue is that microformats are somewhat cryptic, because the focus is to keep the annotations to a minimum. This, in turn, brings up another question of whether embedding metadata into the view (HTML) is a good idea. The question is: what happens if the underlying data changes when someone makes a copy of the HTML document? Nevertheless, despite these issues, microformats are gaining popularity because they are simple. Microformats are currently used by Flickr, Eventful, and LinkedIn; and many other companies are looking to adopt microformats, particularly because of the recent Yahoo! announcement.

An even simpler approach is to put meta data into the meta headers. This approach has been around for a while and it is a shame that it has not been widely adopted. As an example, the New York Times recently launched extended annotations for its news pages. The benefit of this approach is that it works great for pages that are focused on a topic or a thing. For example, a news page can be described with a set of keywords, geo location, date, time, people, and categories. Another example would be for book pages. O’Reilly.com has been putting book information into the meta headers, describing the author, ISBN, and category of the book.

Despite the fact that all these approaches are different, they are also somewhat complimentary; and each of them is helpful. The more annotations there are in web pages, the more standards are implemented, and the more discoverable and powerful the information becomes.

3. Consumer and Enterprise

Yet another dimension of the conversation about the Semantic Web is the focus on consumer and enterprise applications. In the consumer arena we have been looking for a Killer App – something that delivers tangible and simple consumer value. People simply do not care that a product is built on the Semantic Web, all they are looking for is utility and usefulness.

Up until recently, the challenge has been that the Semantic Web is focused on rather academic issues – like annotating information to make it machine readable. The promise was that once the information is annotated and the web becomes one big giant RDF database, then exciting consumer applications will come. The skeptics, however, have been pointing out that first there needs to be a compelling use case.

Some consumer applications based on the Semantic Web: generic and vertical search, contextual shortcuts and previews, personal information management systems, semantic browsing tools. All of these applications are in their early days and have a long way to go before being truly compelling for the average web user. Still, even if these applications succeed, consumers will not be interested in knowing about the underlying technology – so there is really no marketing play for the Semantic Web in the consumer space.

Enterprises are a different story for a couple of reasons. First, enterprises are much more used to techno speak. To them utilizing semantic technologies translates into being intelligent and that, in turn, is good marketing. ‘Our products are better and smarter because we use the Semantic Web’ sounds like a good value proposition for the enterprise.

But even above the marketing speak, RDF solves a problem of data interoperability and standards. This “Tower of Babel” situation has been in existence since the early days of software. Forget semantics; just a standard protocol, a standard way to pass around information between two programs, is hugely valuable in the enterprise.

RDF offers a way to communicate using XML-based language, which on top of it has sound mathematical elements to enable semantics. This sounds great, and even the complexity of RDF is not going to stop enterprises from using it. However, there is another problem that might stop it – scalability. Unlike relational databases, which have been around for ages and have been optimized and tuned, XML-based databases are still not widespread. In general, the problem is in the scale and querying capabilities. Like object-oriented database technologies of the late nineties, XML-based databases hold a lot of promise, but we are yet to see them in action in a big way.

4. Semantic APIs

With the rise of Semantic Web applications, we are also seeing the rise of Semantic APIs. In general, these web services take as an input unstructured information and find entities and relationships. One way to think of these services is mini natural language processing tools, which are only concerned with a subset of the language.

The first example is the Open Calais API from Reuters that we have covered in two articles here and here. This service accepts raw text and returns information about people, places, and companies found in the document. The output not only returns the list of found matches, but also specifies places in the document where the information is found. Behind Calais is a powerful natural language processing technology developed by Clear Forest (now owned by Reuters), which relies on algorithms and databases to extract entities out of text. According to Reuters, Calais is extensible, and it is just a matter of time before new entities will be added.

Another example is the SemanticHacker API from TextWise, which is offering a one million dollar prize for the best commercial semantic web application developed on top of it. This API classifies information in documents into categories called semantic signatures. Given a document, it outputs entities or topics that the document is about. It is kind of like Calais, but also delivers a topical hierarchy, where the actual objects are leafs.

Another semantic API is offered by Dapper – a web service which facilitates the extraction of structure from unstructured HTML pages. Dapper works by enabling users to define attributes of an object based on the bits of the page. For example, a book publisher might define where the information about author, isbn and number of pages is on a typical book page and the Dapper application would then create a recognizer for any page on the publisher site and enable access to it via REST API.

While this seems backwards from an engineering point of view, Dapper’s technology is remarkably useful in the real world. In a typical scenario, for web sites that do not have clean APIs to access their information, even non-technical people can build an API in minutes with Dapper. This is a powerful way of quickly turning web sites into web services.

5. Search Technologies

Perhaps the first significant blow to the Semantic Web has been the inability thus far to improve search. The premise that semantical understanding of pages leads to vastly better search has yet to be validated. The two main contenders, Hakia and PowerSet, have made some progress, but not enough. The problem is that Google’s algorithm, which is based on statistical analysis, deals just fine with semantic entities like people, cities, and companies. When asked What is the capital of France? Google returns a good enough answer.

There is a growing realization that marginal improvement in search might not be enough to beat Google, and to declare search the killer app for the Semantic Web. Likely, understanding semantics is helpful but not sufficient to build a better search engine. A combination of semantics, innovative presentation, and memory of who the user is, will be necessary to power the next generation search experience.

Alternative approaches also attempt to overlay semantics on top of the search results. Even Google ventures into verticals by partitioning the results into different categories. The consumer can then decide which type of answer they are interested in.

Yet search is a game that is far from won and a lot of semantic companies are really trying to raise the bar. There may be another twist to the whole search play – contextual technologies, as well as semantic databases, could lead to qualitatively better results. And so we turn to these next.

6. Contextual Technologies

We are seeing an increasing number of contextual tools entering the consumer market. Contextual navigation does not just improve search, but rather shortcuts it. Applications like Snap or Yahoo! Shortcuts or SmartLinks “understand” the objects inside text and links and bring relevant information right into the user’s context. The result is that the user does not need to search at all.

Thinking about this more deeply, one realizes that contextual tools leverage semantics in a much more interesting way. Instead of trying to parse what a user types into the search box, contextual technologies rely on analyzing the content. So the meaning is derived in a much more precise way – or rather, there is less guessing. The contextual tools then offer the users relevant choices, each of which leads to a correct result. This is fundamentally different from trying to pull the right results from a myriad of possible choices resulting from a web search.

We are also seeing an increasing number of contextual technologies make their way into the browser. Top-down semantic technologies need to work without publishers doing anything; and so to infer context, contextual technologies integrate into the browser. Firefox’s recommended extensions page features a number of contextual browsing solutions – Interclue, ThumbStrips, Cooliris, and BlueOrganizer (from my own company).

The common theme among these tools is the recognition of information and the creation of specific micro contexts for the users to interact with that information.

7. Semantic Databases

Semantic databases are another breed of semantic applications focused on annotating web information to be more structured. Twine, a product of Radar Networks and currently in private beta, focuses on building a personal knowledge base. Twine works by absorbing unstructured content in various forms and building a personal database of people, companies, things, locations, etc. The content is sent to Twine via bookmarklet or via email or manually. The technology needs to evolve more, but one can see how such databases can be useful once the kinks are worked out. One of the very powerful applications that could be built on top of Twine, for example, is personalized search – a way to filter the results of any search engine based on a particular individual.

It is worth noting that Radar Networks has spent a lot of time getting the infrastructure right. The underlying representation is RDF and is ready to be consumed by other semantic web services. But a big chunk of the core algorithms, the ones that are dealing with entity extraction, are being commoditized by Semantic Web APIs. Reuters offers this as an API call, for example, and so moving forward, Twine won’t need to be concerned with how to do that.

Another big player in the semantic databases space is a company called Metaweb, which created Freebase. In its present form, Freebase is just a fancier and more structured version of Wikipedia – with RDF inside and less information in total. The overall goal of Freebase, however, is to build a Wikipedia equivalent of the world’s information. Such a database would be enormously powerful because it could be queried exactly – much like relational databases. So once again the promise is to build much better search.

But the problem is, how can Freebase keep up with the world? Google indexes the Internet daily and grows together with the web. Freebase currently allows editing of information by individuals and has bootstrapped by taking in parts of Wikipedia and other databases, but in order to scale this approach, it needs to perfect the art of continuously taking in unstructured information from the world, parsing it, and updating its database.

The problem of keeping up with the world is common to all database approaches, which are effectively silos. In the case of Twine, there needs to be continuous influx of user data, and in the case of Freebase there needs to be influx of data from the web. These problems are far from trivial and need to be solved successfully in order for the databases to be useful.


With any new technology it is important to define and classify things. The Semantic Web is offering an exciting promise: improved information discoverability, automation of complex searches, and innovative web browsing. Yet the Semantic Web means different things to different people. Indeed, its definition in the enterprise and consumer spaces is different, and there are different means to a common end – top-down vs. bottom up and microformats vs. RDF. In addition to these patterns, we are observing the rise of semantic APIs and contextual browsing tools. All of these are in their early days, but hold a big promise to fundamentally change the way we interact with information on the web.

What do you think about Semantic Web Patterns? What trends are you seeing and which applications are you waiting for? And if you work with semantic technologies in the enterprise, please share your experiences with us in the comments below.

Read Full Post »

Evolving Trends

January 7, 2007

Designing a better Web 3.0 search engine

This post discusses the significant drawbacks of current quasi-semantic search engines (e.g. hakia.com, ask.com et al) and examines the potential future intersection of Wikipedia, Wikia Search (the recently announced search-engine-in-development, by Wikipedia’s founder), future semantic version of Wikipedia (aka Wikipedia 3.0), and Google’s Pagerank algorithm to shed some light on how to design a better semantic search engine (aka Web 3.0 search engine)

Query Side Improvements

Semantic “understanding” of search queries (or questions) determines the quality of relevant search results (or answers.)

However, current quasi-semantic search engines like hakia and ask.com can barely understand the user’s queries and that is because they’ve chosen free-form natural language as the query format. Reasoning about natural language search queries can be accomplished by: a) Artificial General Intelligence or b) statistical semantic models (which introduce an amount of inaccuracy in constructing internal semantic queries). But a better approach at this early stage may be to guide the user through selecting a domain of knowledge and staying consistent within the semantics of that domain.

The proposed approach implies an interactive search process rather than a one-shot search query. Once the search engine confirms the user’s “search direction,” it can formulate an ontology (on the fly) that specifies a range of concepts that the user could supply in formulating the semantic search query. There would be a minimal amount of input needed to arrive at the desired result (or answer), determined by the user when they declare “I’ve found it!.”

Information Side Improvements

We are beginning to see search engines that claim they can semantic-ize arbitrary unstructured “Wild Wild Web” information. Wikipedia pages, constrained to the Wikipedia knowledge management format, may be easier to semantic-ize on the fly. However, at this early stage, a better approach may be to use human-directed crawling that associates the information sources with clearly defined domains/ontologies. An explicit publicized preference for those information sources (including a future semantic version of Wikipedia, a la Wikipedia 3.0) that have embedded semantic annotations (using, e.g., RDFa http://www.w3.org/TR/xhtml-rdfa-primer/ or microformats http://microformats.org) will lead to improved semantic search.

How can we adapt the currently successful Google PageRank algorithm (for ranking information sources) to semantic search?

One answer is that we would need to design a ‘ResourceRank’ algorithm (referring to RDF resources) to manage the semantic search engines’ “attention bandwidth.” Less radical, may be to design a ‘FragmentRank’ algorithm which would rank at the page-component level (ex: paragraph, image, wikipedia page section, etc).


  1. Wikipedia 3.0: The End of Google?
  2. Search By meaning


  1. See relevant links under comments

Posted by Marc Fawzi and ToxicWave

Share and Prosper digg.png


web 3.0, web 3.0, web 3.0, semantic web, semantic web, ontology, reasoning, artificial intelligence, AI, hakia, ask.com, pagerank, google, semantic search, RDFa, ResourceRank, RDF, Semantic Mediawiki, Microformats


  1. I found the following links at http://wiki.ontoworld.org/index.php/SemWiki2006

    1) http://wiki.ontoworld.org/wiki/Harvesting_Wiki_Consensus_-_Using_Wikipedia_Entries_as_Ontology_Elements
    “The English version of Wikipedia contains now more than 850,000 entries and thus the same amount of URIs plus a human-readable description. While this collection is on the lower end of ontology expressiveness, it is likely the largest living ontology that is available today. In this paper, we (1) show that standard Wiki technology can be easily used as an ontology development environment for named classes, reducing entry barriers for the participation of users in the creation and maintenance of lightweight ontologies, (2) prove that the URIs of Wikipedia entries are surprisingly reliable identifiers for ontology concepts, and (3) demonstrate the applicability of our approach in a use case.”

    2) http://wiki.ontoworld.org/wiki/Extracting_Semantic_Relationships_between_Wikipedia_Categories
    “We suggest that semantic information can be extracted from Wikipedia by analyzing the links between categories. The results can be used for building a semantic schema for Wikipedia which could improve its search capabilities and provide contributors with meaningful suggestions for editing theWikipedia pages.We analyze relevant measures for inferring the semantic relationships between page categories of Wikipedia.”

    3) http://wiki.ontoworld.org/wiki/From_Wikipedia_to_Semantic_Relationships:_a_Semi-automated_Annotation_Approach

    Comment by SeH.999 — January 7, 2007 @ 8:45 pm

  2. Thanks for the relevant links.


    Comment by evolvingtrends — January 7, 2007 @ 9:02 pm

  3. What if you had an AI which used stochastic models and had feedback mechanisms so that it could use evolutionary programming to learn which results were best? Combining Yahoo and Google (people and robots)…?

    Comment by Sam Jackson — January 8, 2007 @ 2:18 pm

  4. > What if you had an AI which used stochastic models…

    in a way, the data set (wikipedia pages + wild-wild-web pages) is itself stochastic.

    re feedback mechanism: if google knows what search results you visit, then they can feedback visited pages into pagerank. but in a directed, multi-step search process, the way the user narrows results is explicit, yielding a _much richer_ feedback loop. not just in terms of which results are chosen, but in the _particular way_ sets of results answer the search ‘problem’.

    re evolutionary programming: useful (along with neural networks) as a possible method that the search-engine uses to optimize its operating parameters, in the crawl or result-fetching stages.

    merging/unfiying the crawl and results processes together, you can imagine a human supervised-learning process where the engine learns how to crawl _and_ fetch/present results for randomly-generated, historical, or real-time queries. this way, everyone that uses the engine unknowingly trains it.

    “Using the knowledge linked to by URL u, I can answer search ‘directions’ according to Ontology o”

    Comment by SeH.999 — January 8, 2007 @ 8:30 pm

  5. My line of thought precisely. Although I wonder if that would open it up to a whole new realm of blackhat SEO with click farms in china or on zombie armies? Something for Google et al to try to work out, I guess.

    Comment by Sam Jackson — January 8, 2007 @ 9:23 pm

  6. Google has no future.

    Money does not buy the future. It only glues you to the present, and the present becomes the past.

    The future is not for sale. It’s for those who can claim it.

    Money obeys the future, not vice versa.


    Comment by evolvingtrends — January 9, 2007 @ 4:02 am

  7. Well, there’s a saying that goes: money talks, bullshit walks.

    However, the problem with Google is bigger than money can fix.

    Google is stuck with a technology and a business model that are less optimal than what is possible today (never mind what will be possible in two or three years), so they either distribute all their profits as dividends and start over with Google 3.0 using a new technology and a new business model (i.e. disrupt themselves) or submit to the fact that their technology and business model are, like all technologies and business models, not immune to disruption.

    But that’s just one view. Another view could be that they will last forever or for a very long time. They may very well last forever or a very long time but definitely not as the dominant search engine. Anyone who thinks so is contradicting nature and idolizing Google.

    Nature is all about survival of the fittest.

    Google’s technology and business model are not the fittest, by design.

    Who will undermine Google?

    That’s the $300B question.

    My answer is: Google itself.

    It’s like being on a seasaw, over a cliff. For now, the mountain side is weighed down by mass misconception and by the competitors’ sub-mediocre execution.

    Speaking of execution, let me inject the word “Saddam” here so Google starts associating this blog with Saddam execution videos. Do you see how dumb Google is???

    It’s not about semantic vs non-semantic design. It’s about bad design vs good design. You can undermine a bad desin a lot easier than a good design.

    It’s time to come up with a good one!

    There are private companies competing with NASA (the organization that put a man on the moon 38 years ago) and they’re succeeding at it … Why shouldn’t we have an X Prize for teh first company to come up with a P2P search engine that beats google (i.e. The People’s Google)?

    Time for breakfast, again.

    P.S. I do have to believe in breakfast in order to exist.

    Comment by evolvingtrends — January 9, 2007 @ 11:57 am

  8. I agree with your vision. But there are many technical difficulties. For example, on-the-fly ontology generation is a very hard problem. Especially if you want to play it on the user side, I doubt wether it might work. We will have new search models (other than Google and Yahoo) for Semantic Web. But the time is not ready for the revolution yet.

    Anyway, I believe your thoughts are great. Recently I will post a new article about web evolution. I think you might be interested in reading it. 😉

    Comment by Yihong Ding — January 9, 2007 @ 1:26 pm

  9. No one can say the “time is not ready,” especially not a semantic web researcher. The time is always ready. The question is whether or not we’re ready. I believe we are 🙂 …

    Things already in motion.

    Comment by evolvingtrends — January 10, 2007 @ 5:59 am

  10. > But there are many technical difficulties. For example, on-the-fly ontology generation is a very hard problem.

    Any elementary algorithm can generate on-the-fly ontologies, the question is how useful, reusable, and accurate they are.

    If you think along the lines of “Fluid ontologies”, “Fluid Knowledge,” or “Evolving Ontologies”? May be a killer app for semantic web, because the ‘rigid’ binding OWL (or OWL-like) ontologies to data yields a relatively narrow range of expression.

    > But the time is not ready for the revolution yet.

    The time has always been “ready for the revolution yet”, but it has never been ready for people to state that it hasn’t. 😉

    Comment by SeH.999 — January 11, 2007 @ 4:38 pm

  11. http://blog.wired.com/monkeybites/2007/01/wikiseek_launch.html
    Tuesday, 16 January 2007
    SearchMe Launches Wikiseek, A Wikipedia Search Engine
    Topic: search

    The search engine company SearchMe has launched a new service, Wikiseek, which indexes and searches the contents of Wikipedia and those sites which are referenced within Wikipedia. Though not officially a part of Wikipedia, TechCrunch reports that Wikiseek was “built with Wikipedia’s assistance and permission”

    Because Wikiseek only indexes Wikipedia and sites that Wikipedia links to, the results are less subject to the spam and SEO schemes that can clutter up Google and Yahoo search listings.

    According to the Wikiseek pages, the search engine “utilizes Searchme’s category refinement technology, providing suggested search refinements based on user tagging and categorization within Wikipedia, making results more relevant than conventional search engines.”

    Along with search results Wikiseek displays a tag cloud which allows you to narrow or broaden your search results based on topically related information.

    Wikiseek offers a Firefox search plugin as well as a Javascript-based extension that alters actual Wikipedia pages to add a Wikiseek search button (see screenshot below). Hopefully similar options will be available for other browsers in the future.

    SearchMe is using Wikiseek as a showcase product and is donating a large portion of the advertising revenue generated by Wikiseek back to Wikipedia. The company also claims to have more niche search engines in the works.

    If Wikiseek is any indication, SearchMe will be one to watch. The interface has the simplicity of Google, but searches are considerably faster — lightning fast, in fact. Granted, Wikiseek is indexing far fewer pages than Google or Yahoo. But if speed is a factor, niche search engines like Wikiseek may pose a serious threat to the giants like Google and Yahoo.

    Steve Rubel of Micro Persuasion has an interesting post about the growing influence of Wikipedia and how it could pose a big threat to Google in the near future. Here are some statistics from his post:

    The number of Wikipedians who have edited ten or more articles continues its hockey stick growth. In October 2006 that number climbed to 158,000 people. Further, media citations rose 300% last year, according to data compiled using Factiva. Last year Wikipedia was cited 11,000 times in the press. Traffic is on the rise too. Hitwise says that Wikipedia is the 20th most visited domain in the US.

    While Wikiseek will probably not pose a serious threat to the search giants, Wikipedia founder Jimmy Wales is looking to compete with the search giants at some point. While few details have emerged, he has announced an as-yet-unavailable new search engine, dubbed Search Wikia, which aims to be a people-powered alternative to Google.

    With numbers like the ones cited above, Wikipedia may indeed pose a threat to Google, Yahoo and the rest.

    Comment by Tina — January 16, 2007 @ 7:39 pm

  12. Copying the Wikipedia 3.0 vision in a half assed way is more about leveraging the hype to make a buck than moving us forward.

    However, I’d give any effort a huge benefit of the doubt just for trying.


    Comment by evolvingtrends — January 17, 2007 @ 2:17 am

  13. […] Jan 7, ‘07: Also make sure to check out “Designing a Better Web 3.0 Search Engine.” […]

    Pingback by Wikipedia 3.0: The End of Google? « Evolving Trends — March 2, 2007 @ 10:31 pm

  14. […] turned up a short counter-point blog post about their approach by Marc Fawzi and […]

    Pingback by Blank (Media) Slate » Blog Archive » Promise of a Better Search with Hakia — March 9, 2007 @ 5:33 pm

  15. […] Now see this Evolving Trends article that preceded the description from the above. Designing a Better Web 3.0 Search Engine. […]

    Pingback by Hakia, Google, Wikia (Revision 2) « Evolving Trends — September 26, 2007 @ 10:08 pm

Read Full Post »

  • My Dashboard
  • New Post
  • Evolving Trends

    July 20, 2006

    Google dont like Web 3.0 [sic]

    (this post was last updated at 9:50am EST, July 24, ‘06)

    Why am I not surprised?

    Google exec challenges Berners-Lee

    The idea is that the Semantic Web will allow people to run AI-enabled P2P Search Engines that will collectively be more powerful than Google can ever be, which will relegate Google to just another source of information, especially as Wikipedia [not Google] is positioned to lead the creation of domain-specific ontologies, which are the foundation for machine-reasoning [about information] in the Semantic Web.

    Additionally, we could see content producers (including bloggers) creating informal ontologies on top of the information they produce using a standard language like RDF. This would have the same effect as far as P2P AI Search Engines and Google’s anticipated slide into the commodity layer (unless of course they develop something like GWorld)

    In summary, any attempt to arrive at widely adopted Semantic Web standards would significantly lower the value of Google’s investment in the current non-semantic Web by commoditizing “findability” and allowing for intelligent info agents to be built that could collaborate with each other to find answers more effectively than the current version of Google, using “search by meaning” as opposed to “search by keyword”, as well as more cost-efficiently than any future AI-enabled version of Google, using disruptive P2P AI technology.

    For more information, see the articles below.


    1. Wikipedia 3.0: The End of Google?
    2. Wikipedia 3.0: El fin de Google (traducción)
    3. All About Web 3.0
    4. Web 3.0: Basic Concepts
    5. P2P 3.0: The People’s Google
    6. Intelligence (Not Content) is King in Web 3.0
    7. Web 3.0 Blog Application
    8. Towards Intelligent Findability
    9. Why Net Neutrality is Good for Web 3.0
    10. Semantic MediaWiki
    11. Get Your DBin

    Somewhat Related

    1. Unwisdom of Crowds
    2. Reality as a Service (RaaS): The Case for GWorld
    3. Google 2.71828: Analysis of Google’s Web 2.0 Strategy
    4. Is Google a Monopoly?
    5. Self-Aware e-Society


    1. In the Hearts of the Wildmen

    Posted by Marc Fawzi

    Enjoyed this analysis? You may share it with others on:

    digg.png newsvine.png nowpublic.jpg reddit.png blinkbits.png co.mments.gif stumbleupon.png webride.gif del.icio.us


    Semantic Web, Web strandards, Trends, OWL, innovation, Startup, Evolution, Google, GData, inference, inference engine, AI, ontology, Semanticweb, Web 2.0, Web 2.0, Web 3.0, Web 3.0, Google Base, artificial intelligence, AI, Wikipedia, Wikipedia 3.0, collective consciousness, Ontoworld, Wikipedia AI, Info Agent, Semantic MediaWiki, DBin, P2P 3.0, P2P AI, AI Matrix, P2P Semantic Web inference Engine, semantic blog, intelligent findability, RDF

    Read Full Post »

    Evolving Trends

    July 19, 2006

    Towards Intelligent Findability

    (This post was last updated at 12:45pm EST, July 22, 06)

    By Eric Noam Rodriguez (versión original en español CMS Semántico)

    Editing and Addendum by Marc Fawzi

    A lot of buzz about Web 3.0 and Wikipedia 3.0 has been generagted lately by Marc Fawzi through this blog, so I’ve decided that for my first post here I’d like to dive into this idea and take a look at how to build a Semantic Content Management System (CMS). I know this blog has had a more of a visionary, psychological and sociological theme (i.e., the vision for the future and the Web’s effect on society, human relationships and the individual himself), but I’d like to show the feasibility of this vision by providing some technical details.



    We want a CMS capable of building a knowledge base (that is a set of domain-specific ontologies) with formal deductive reasoning capabilities.



    1. A semantic CMS framework.
    2. An ontology API.
    3. An inference engine.
    4. A framework for building info-agents.



    The general idea would be something like this:

    1. Users use a semantic CMS like Semantic MediaWiki to enter information as well as semantic annotations (to establish semantic links between concepts in the given domain on top of the content) This typically produces an informal ontology on top of the information, which, when combined with domain inference rules and the query structures (for the particular schema) that are implemented in an independent info agent or built into the CMS, would give us a Domain Knowledge Database. (Alternatively, we can have users enter information into a non-semantic CMS to create content based on a given doctype or content schema and then front-end it with an info agent that works with a formal ontology of the given domain, but we would then need to perform natural language processing, including using statistical semantic models, since we would lose the certainty that would normally be provided by the semantic annotations that, in a Semantic CMS, would break down the natural language in the information to a definite semantic structure.)
    2. Another set of info agents adds to our knowledge base inferencing-based querying services for information on the Web or other domain-specific databases. User entered information plus information obtained from the web makes up our Global Knowledge Database.
    3. We provide a Web-based interface for querying the inference engine.

    Each doctype or schema (depending on the CMS of your choice) will have a more or less direct correspondence with our ontologies (i.e. one schema or doctype maps with one ontology). The sum of all the content of a particular schema makes up a knowledge-domain which when transformed into a semantic language like (RDF or more specifically OWL) and combined with the domain inference rules and the query structures (for the particular schema) constitute our knowledge database. The choice of CMS is not relevant as long as you can query its contents while being able to define schemas. What is important is the need for an API to access the ontology. Luckily projects like JENA fills this void perfectly providing both an RDF and an OWL API for Java.

    In addition, we may want an agent to add or complete our knowledge base using available Web Services (WS). I’ll assume you’re familiarized with WS so I won’t go into details.


    Now, the inference engine would seem like a very hard part. It is. But not for lack of existing technology: the W3C already have a recommendation language for querying RDF (viz. a semantic language) known as SPARQL (http://www.w3.org/TR/rdf-sparql-query/) and JENA already has a SPARQL query engine.

    The difficulty lies in the construction of ontologies which would have to be formal (i.e. consistent, complete, and thoroughly studied by experts in each knowledge-domain) in order to obtain powerful deductive capabilities (i.e. reasoning).


    We already have technology powerful enough to build projects such as this: solid CMS, standards such as RDF, OWL, and SPARQL as well as a stable framework for using them such as JENA. There are also many frameworks for building info-agents but you don’t necessarily need a specialized framework, a general software framework like J2EE is good enough for the tasks described in this post.

    All we need to move forward with delivering on the Web 3.0 vision (see 1, 2, 3) is the will of the people and your imagination.


    In the diagram below, the domain-specific ontologies (OWL 1 … N) could be all built by Wikipedia (see Wikipedia 3.0) since they already have the largest online database of human knowledge and the domain experts among their volunteers to build the ontologies for each domain of human knowledge. One possible way is for Wikipedia will build informal ontologies using Semantic MediaWiki (as Ontoworld is doing for the Semantic Web domain of knowledge) but Wikipedia may wish to wait until they have the ability to build formal ontologies, which would enable more powerful machine-reasoning capabilities.

    [Note: The ontologies simply allow machines to reason about information. They are not information but meta-information. They have to be formally consistent and complete for best results as far as machine-based reasoning is concerned.]

    However, individuals, teams, organizations and corporations do not have to wait for Wikipedia to build the ontologies. They can start building their own domain-specific ontologies (for their own domains of knowledge) and use Google, Wikipedia, MySpace, etc as sources of information. But as stated in my latest edit to Eric’s post, we would have to use natural language processing in that case, including statistical semantic models, as the information won’t be pre-semanticized (or semantically annotated), which makes the task more dificult (for us and for the machine …)

    What was envisioned in the Wikipedia 3.0: The End of Google? article was that since Wikipedia has the volunteer resources and the world’s largest database of human knowledge then it will be in the powerful position of being the developer and maintainer of the ontologies (including the semantic annotations/statements embedded in each page) which will become the foundation for intelligence (and “Intelligent Findability”) in Web 3.0.

    This vision is also compatible with the vision for P2P AI (or P2P 3.0), where people will run P2P inference engines on their PCs that communicate and collaborate with each other and that tap into information form Google, Wikipedia, etc, which will ultimately push Google and central search engines down to the commodity layer (eventually making them a utility business just like ISPs.)



    1. Wikipedia 3.0: The End of Google? June 26, 2006
    2. Wikipedia 3.0: El fin de Google (traducción) July 12, 2006
    3. Web 3.0: Basic Concepts June 30, 2006
    4. P2P 3.0: The People’s Google July 11, 2006
    5. Why Net Neutrality is Good for Web 3.0 July 15, 2006
    6. Intelligence (Not Content) is King in Web 3.0 July 17, 2006
    7. Web 3.0 Blog Application July 18, 2006
    8. Semantic MediaWiki July 12, 2006
    9. Get Your DBin July 12, 2006

    Enjoyed this analysis? You may share it with others on:

    digg.png newsvine.png nowpublic.jpg reddit.png blinkbits.png co.mments.gif stumbleupon.png webride.gif del.icio.us


    Semantic Web, Web strandards, Trends, OWL, innovation, Startup, Google, GData, inference engine, AI, ontology, Semantic Web, Web 2.0, Web 2.0, Web 3.0, Web 3.0, Google Base, artificial intelligence, AI, Wikipedia, Wikipedia 3.0, Ontoworld, Wikipedia AI, Info Agent, Semantic MediaWiki, DBin, P2P 3.0, P2P AI, AI Matrix, P2P Semantic Web inference Engine, semantic blog, intelligent findability, JENA, SPARQL, RDF, OWL


    Read Full Post »

    %d bloggers like this: