Feeds:
Posts
Comments

Posts Tagged ‘yahoo’

Search War: Yahoo! Opens Its Search Engine to Attack Google With An Army of Verticals

Written by Marshall Kirkpatrick / July 9, 2008 9:00 PM / 15 Comments


BossYahoo! is taking a bold step tonight: opening up its index and search engine to any outside developers who want to incorporate Yahoo! Search’s content and functionality into search engines on their own sites. The company that sees just over 20% of the searches performed each day believes that the new program, called BOSS (Build Your Own Search Service), could create a cadre of small search engines that in aggregate will outstrip their own market share and leave Google with less than 50% of the search market.

It’s an ambitious and exciting idea. It could also become very profitable when Yahoo! later enables the inclusion of Yahoo! search ads on sites using the BOSS APIs. BOSS will include access to Yahoo! web, news and image searches.

Partner Relationships

Websites wishing to leverage the BOSS APIs will be allowed to can blend in their own ranking input and change the presentation of results. There are no requirements for attribution to Yahoo! and there’s no limit on the number of queries that can be performed.

At launch Yahoo! BOSS will see live integrations with at least three other companies. Hakia will integrate their semantic parsing with the Yahoo! index and search, social browser plug-in Me.dium will use the data it’s collected to offer a social search tied to the Yahoo! index, and real-time sentiment search engine Summize was included in the BOSS demo – augmenting Yahoo News search results with related Twitter messages.

More extensive customization and integration with large media companies will be performed with assistance from Yahoo! and ad-free access to the APIs will be made available to the Computer Science departments of academic institutions.

mediumBOSS.jpgMe.dium captures 20m URLs daily and will use BOSS to show social relevance in addition to link-weight in search. 

Does Anyone Really Care About Niche Vertical Search Engines?

We asked Yahoo! just that, although we believe that alternative search engines can be pretty exciting. None the less, we think it’s a valid question.

Senior Director of the Open Search Platform, Bill Michels told us that niche search engines often aren’t very good because they have access to a very limited index of content. It’s expensive to index the whole web. Likewise, Michels said that there are a substantial number of large organizations that have a huge amount of content but don’t have world-class search technology.

In both cases, Yahoo! BOSS is intended to level the playing field and blow the Big 3 wide open. We agree that it’s very exciting to imagine thousands of new Yahoo! powered niche search engines proliferating. Could Yahoo! plus the respective strengths and communities of all these new players challenge Google? We think they could.

<!–HakiaBOSS.jpg
Hakia will parse the Yahoo! index for semantic meaning and data type.–>

What’s Not Included?

The BOSS APIs are in beta for now, so they may be expanded with time – but for now there are still a few crown jewels in the company’s plans that won’t be opened up. We asked about Yahoo’s indexing of the semantic web and were told that would not be a part of BOSS. We asked about the Inbox 2.0 strategy and the company’s plans to rewire for social graph and data portability paradigms. We were told that those were “other programs.”

We hope that there’s not a fundamental disconnect there that will lead to lost opportunities and a lack of focus. It is clear, though, that BOSS falls well within the company’s overall technical strategy of openness. When it comes to web standards, openness and support for the ecosystem of innovation – there may be no other major vendor online as strong as Yahoo! is today. These are times of openness, where some believe that no single vendor’s technology and genius alone can match the creativity of an empowered open market of developers. Yahoo! is positioning itself as leader of this movement.

Let’s see what they can do with an army of Yahoo! powered search engines. Let the games begin!

Read Full Post »

Yahoo to Enable Custom Semantic Search Engines

Written by Marshall Kirkpatrick / February 11, 2009 9:14 AM / 2 Comments


Yahoo is bringing together two of its most interesting projects today, Yahoo BOSS (Build Your Own Search Service) and SearchMonkey, its semantic indexing and search result enhancement service. There were a number of different parts of the announcement – but the core of the story is simple.

Developers will now be able to build their own search engines using the Yahoo! index and search processing infrastructure via BOSS and include the semantic markup added to pages in both results parsing and the display of those results. There’s considerable potential here for some really dazzling results.

We wrote about the genesis of Search Monkey here this Spring, it’s an incredibly ambitious project. The end result of it is rich search results, where additional dynamic data from marked up fields can also be displayed on the search results page itself. So searching for a movie will show not just web pages associated with that movie, but additional details from those pages, like movie ratings, stars, etc. There’s all kinds of possibilities for all kinds of data.

Is anyone using Yahoo! BOSS yet? Anyone who will be able to leverage Search Monkey for a better experience right away? Yahoo is encouraging developers to tag their projects bossmashup in Delicious. As you can see for yourself, there are a number of interesting proofs of concept there but not a whole lot of products. Of the products that are there, very few seem terribly compelling to us so far.

We must admit that the most compelling BOSS implementation so far is over at the site of our competitors TechCrunch. Their new blog network search implementation of BOSS is beautiful – you can see easily, for example, that TechCrunch network blogs have used the word ReadWriteWeb 7 times in the last 6 months. (In case you were wondering.)

Speaking of TechCrunch, that site’s Mark Hendrickson covered the Yahoo BOSS/Search Monkey announcement today as well, and having worked closely on the implementation there he’s got an interesting perspective on it. He points out that the new pricing model, free up to 10,000 queries a day, will likely only impact a handful of big sites – not BOSS add-ons like TechCrunch search or smaller projects.

The other interesting part of the announcement is that BOSS developers will now be allowed to use 3rd party ads on their pages leveraging BOSS – not just Yahoo adds. That’s hopeful.

Can Yahoo do it? Can these two projects brought together lead to awesome search mashups all over the web? We’ve had very high hopes in the past. Now the proof will be in the pudding.

Read Full Post »


Report: Semantic Web Companies Are, or Will Soon Begin, Making Money

Written by Marshall Kirkpatrick / October 3, 2008 5:13 PM / 14 Comments


provostpic-1.jpgSemantic Web entrepreneur David Provost has published a report about the state of business in the Semantic Web and it’s a good read for anyone interested in the sector. It’s titled On the Cusp: A Global Review of the Semantic Web Industry. We also mentioned it in our post Where Are All The RDF-based Semantic Web Apps?.

The Semantic Web is a collection of technologies that makes the meaning of content online understandable by machines. After surveying 17 Semantic Web companies, Provost concludes that Semantic science is being productized, differentiated, invested in by mainstream players and increasingly sought after in the business world.

Provost aims to use real-world examples to articulate the value proposition of the Semantic Web in accessible, non-technical language. That there are enough examples available for him to do this is great. His conclusions don’t always seem as well supported by his evidence as he’d like – but the profiles he writes of 17 Semantic Web companies are very interesting to read.

What are these companies doing? Provost writes:

“..some companies are beginning to focus on specific uses of Semantic technology to create solutions in areas like knowledge management, risk management, content management and more. This is a key development in the Semantic Web industry because until fairly recently, most vendors simply sold development tools.”

 

The report surveys companies ranging from the innovative but unlaunched Anzo for Excel from Cambridge Semantics, to well-known big players like Down Jones Client Solutions and RWW sponsor Reuters Calais Initiative, to relatively unknown big players like the already very commercialized Expert System. 10 of the companies were from the US, 6 from Europe and 1 from South Korea.

semwebchart.jpgAbove: Chart from Provost’s report.We’ve been wanting to learn more about “under the radar” but commercialized semantic web companies ever since doing a briefing with Expert System a few months ago. We had never heard of the Italian company before, but they believe they already have they have a richer, deeper semantic index than anyone else online. They told us their database at the time contained 350k English words and 2.8m relationships between them. including geographic representations. They power Microsoft’s spell checker and the Natural Language Processing (NLP) in the Blackberry. They also sell NLP software to the US military and Department of Homeland Security, which didn’t seem like anything to brag about to us but presumably makes up a significant part of the $12 million+ in revenue they told Provost they made last year.

And some people say the Semantic Web only exists inside the laboratories of Web 3.0 eggheads!

Shortcomings of the Report

Provost writes that “the vendors [in] this report have all the appearances of thriving, emerging technology companies and they have shown their readiness to cross borders, continents, and oceans to reach customers.” You’d think they turned water into wine. Those are strong words for a study in which only 4 of 17 companies were willing to report their revenue and several hadn’t launched products yet.

The logic here is sometimes pretty amazing.

The above examples [there were two discussed – RWW] are just a brief sampling of the commercial success that the Semantic Web has been experiencing. In broad terms, it’s easy to point out the longevity of many companies in this industry and use that as a proxy for commercial success [wow – RWW]. With more time (and space in this report), additional examples could be described but the most interesting prospect pertains to what the industry landscape will look like in twelve months. [hmmm…-RWW]

 

In fact, while Provost has glowingly positive things to about all the companies he surveyed, the absence of engagement with any of their shortcomings makes the report read more like marketing material than any objective take on what’s supposed to be world-changing technology.

This is a Fun Read

The fact is, though, that Provost writes a great introduction to many companies working to sell software in a field still too widely believed to be ephemeral. The stories of each of the 17 companies profiled are fun to read and many of Provost’s points of analysis are both intuitive and thought provoking.

He says the sector is “on the cusp” of major penetration into existing markets currently served by non-semantic software. Provost argues that the Semantic Web struggles to explain itself because the World Wide Web is so intensely visual and semantics are not. He says that reselling business partners in specific distribution channels are combining their domain knowledge with the science of the software developers to bring these tools to market. He tells a great, if unattributed, story about what Linked Data could mean to the banking industry.

We hadn’t heard of several of the companies profiled in the report, and a handful of them had never been mentioned by the 34 semantic web specialist blogs we track, either.

There’s something here for everyone. You can read the full report here.

Read Full Post »

Google: “We’re Not Doing a Good Job with Structured Data”

Written by Sarah Perez / February 2, 2009 7:32 AM / 9 Comments


During a talk at the New England Database Day conference at the Massachusetts Institute of Technology, Google’s Alon Halevy admitted that the search giant has “not been doing a good job” presenting the structured data found on the web to its users. By “structured data,” Halevy was referring to the databases of the “deep web” – those internet resources that sit behind forms and site-specific search boxes, unable to be indexed through passive means.

Google’s Deep Web Search

Halevy, who heads the “Deep Web” search initiative at Google, described the “Shallow Web” as containing about 5 million web pages while the “Deep Web” is estimated to be 500 times the size. This hidden web is currently being indexed in part by Google’s automated systems that submit queries to various databases, retrieving the content found for indexing. In addition to that aspect of the Deep Web – dubbed “vertical searching” – Halevy also referenced two other types of Deep Web Search: semantic search and product search.

Google wants to also be able to retrieve the data found in structured tables on the web, said Halevy, citing a table on a page listing the U.S. presidents as an example. There are 14 billion such tables on the web, and, after filtering, about 154 million of them are interesting enough to be worth indexing.

Can Google Dig into the Deep Web?

The question that remains is whether or not Google’s current search engine technology is going to be adept at doing all the different types of Deep Web indexing or if they will need to come up with something new. As of now, Google uses the Big Table database and MapReduce framework for everything search related, notes Alex Esterkin, Chief Architect at Infobright, Inc., a company delivering open source data warehousing solutions. During the talk, Halevy listed a number of analytical database application challenges that Google is currently dealing with: schema auto-complete, synonym discovery, creating entity lists, association between instances and aspects, and data level synonyms discovery. These challenges are addressed by Infobright’s technology, said Esterkin, but “Google will have to solve these problems the hard way.”

Also mentioned during the speech was how Google plans to organize “aspects” of search queries. The company wants to be able to separate exploratory queries (e.g., “Vietnam travel”) from ones where a user is in search of a particular fact (“Vietnam population”). The former query should deliver information about visa requirements, weather and tour packages, etc. In a way, this is like what the search service offered by Kosmix is doing. But Google wants to go further, said Halevy. “Kosmix will give you an ‘aspect,’ but it’s attached to an information source. In our case, all the aspects might be just Web search results, but we’d organize them differently.”

Yahoo Working on Similar Structured Data Retrieval

The challenges facing Google today are also being addressed by their nearest competitor in search, Yahoo. In December, Yahoo announced that they were taking their SearchMonkey technology in-house to automate the extraction of structured information from large classes of web sites. The results of that in-house extraction technique will allow Yahoo to augment their Yahoo Search results with key information returned alongside the URLs.

In this aspect of web search, it’s clear that no single company has yet to dominate. However, even if a non-Google company surges ahead, it may not be enough to get people to switch engines. Today, “Google” has become synonymous with web search, just like “Kleenex” is a tissue, “Band-Aid” is an adhesive bandage, and “Xerox” is a way to make photocopies. Once that psychological mark has been made into our collective psyches and the habit formed, people tend to stick with what they know, regardless of who does it better. That’s something that’s a bit troublesome – if better search technology for indexing the Deep Web comes into existence outside of Google, the world may not end up using it until such point Google either duplicates or acquires the invention.

Still, it’s far too soon to write Google off yet. They clearly have a lead when it comes to search and that came from hard work, incredibly smart people, and innovative technical achievements. No doubt they can figure out this Deep Web thing, too. (We hope).

Read Full Post »

Yahoo to Enable Custom Semantic Search Engines

Written by Marshall Kirkpatrick / February 11, 2009 9:14 AM / 2 Comments


Yahoo is bringing together two of its most interesting projects today, Yahoo BOSS (Build Your Own Search Service) and SearchMonkey, its semantic indexing and search result enhancement service. There were a number of different parts of the announcement – but the core of the story is simple.

Developers will now be able to build their own search engines using the Yahoo! index and search processing infrastructure via BOSS and include the semantic markup added to pages in both results parsing and the display of those results. There’s considerable potential here for some really dazzling results.

We wrote about the genesis of Search Monkey here this Spring, it’s an incredibly ambitious project. The end result of it is rich search results, where additional dynamic data from marked up fields can also be displayed on the search results page itself. So searching for a movie will show not just web pages associated with that movie, but additional details from those pages, like movie ratings, stars, etc. There’s all kinds of possibilities for all kinds of data.

Is anyone using Yahoo! BOSS yet? Anyone who will be able to leverage Search Monkey for a better experience right away? Yahoo is encouraging developers to tag their projects bossmashup in Delicious. As you can see for yourself, there are a number of interesting proofs of concept there but not a whole lot of products. Of the products that are there, very few seem terribly compelling to us so far.

We must admit that the most compelling BOSS implementation so far is over at the site of our competitors TechCrunch. Their new blog network search implementation of BOSS is beautiful – you can see easily, for example, that TechCrunch network blogs have used the word ReadWriteWeb 7 times in the last 6 months. (In case you were wondering.)

Speaking of TechCrunch, that site’s Mark Hendrickson covered the Yahoo BOSS/Search Monkey announcement today as well, and having worked closely on the implementation there he’s got an interesting perspective on it. He points out that the new pricing model, free up to 10,000 queries a day, will likely only impact a handful of big sites – not BOSS add-ons like TechCrunch search or smaller projects.

The other interesting part of the announcement is that BOSS developers will now be allowed to use 3rd party ads on their pages leveraging BOSS – not just Yahoo adds. That’s hopeful.

Can Yahoo do it? Can these two projects brought together lead to awesome search mashups all over the web? We’ve had very high hopes in the past. Now the proof will be in the pudding.

Read Full Post »

Semantic Web Patterns: A Guide to Semantic Technologies

Written by Alex Iskold / March 25, 2008 3:20 PM / 32 Comments

 


In this article, we’ll analyze the trends and technologies that power the Semantic Web. We’ll identify patterns that are beginning to emerge, classify the different trends, and peak into what the future holds.

In a recent interview Tim Berners-Lee pointed out that the infrastructure to power the Semantic Web is already here. ReadWriteWeb’s founder, Richard MacManus, even picked it to be the number one trend in 2008. And rightly so. Not only are the bits of infrastructure now in place, but we are also seeing startups and larger corporations working hard to deliver end user value on top of this sophisticated set of technologies.

The Semantic Web means many things to different people, because there are a lot of pieces to it. To some, the Semantic Web is the web of data, where information is represented in RDF and OWL. Some people replace RDF with Microformats. Others think that the Semantic Web is about web services, while for many it is about artificial intelligence – computer programs solving complex optimization problems that are out of our reach. And business people always redefine the problem in terms of end user value, saying that whatever it is, it needs to have simple and tangible applications for consumers and enterprises.

The disagreement is not accidental, because the technology and concepts are broad. Much is possible and much is to be imagined.

1. Bottom-Up and Top-Down

We have written a lot about the different approaches to the Semantic Web – the classic bottom-up approach and the new top-down one. The bottom-up approach is focused on annotating information in pages, using RDF, so that it is machine readable. The top-down approach is focused on leveraging information in existing web pages, as-is, to derive meaning automatically. Both approaches are making good progress.

A big win for the bottom-up approach was recent announcement from Yahoo! that their search engine is going to support RDF and microformats. This is a win-win-win for publishers, for Yahoo!, and for customers – publishers now have an incentive to annotate information because Yahoo! Search will be taking advantage of it, and users will then see better, more precise results.

Another recent win for the bottom-up approach was the announcement of the Semantify web service from Dapper (previous coverage). This offering will enable publishers to add semantic annotations to existing web pages. The more tools like Semantify that pop up, the easier it will be for publishers to annotate pages. Automatic annotation tools combined with the incentive to annotate the pages is going to make the bottom-up approach more compelling.

But even if the tools and incentive exists, to make the bottom-up approach widespread is difficult. Today, the magic of Google is that it can understand information as is, without asking people to fully comply with W3C standards of SEO optimization techniques. Similarly, top-down semantic tools are focused on dealing with imperfections in existing information. Among them are the natural language processing tools that do entity extraction – such as the Calais and TextWise APIs that recognize people, companies, places, etc. in documents; vertical search engines, like ZoomInfo and Spock, which mine the web for people; technologies like Dapper and BlueOrganizer, which recognize objects in web pages; and Yahoo! Shortcuts, Snap and SmartLinks, which recognize objects in text and links.

[Disclosure:] Alex Iskold is founder and CEO of AdaptiveBlue, which makes BlueOrganizer and SmartLinks.

Top-down technologies are racing forward despite imperfect information. And, of course, they benefit from the bottom-up annotations as well. The more annotations there are, the more precise top-down technologies will get – because they will be able to take advantage of structured information as well.

2. Annotation Technologies: RDF, Microformats, and Meta Headers

Within the bottom-up approach to annotation of data, there are several choices for annotation. They are not equally powerful, and in fact each approach is a tradeoff between simplicity and completeness. The most comprehensive approach is RDF – a powerful, graph-based language for declaring things, and attributes and relationships between things. In a simplistic way, one can think of RDF as the language that allows expressing truths like: Alex IS human (type expression), Alex HAS a brain (attribute expression), and Alex IS the father of Alice, Lilly, and Sofia (relationship expression). RDF is powerful, but because it is highly recursive, precise, and mathematically sound, it is also complex.

At present, most use of RDF is for interoperability. For example, the medical community uses RDF to describe genomic databases. Because the information is normalized, the databases that were previously silos can now be queried together and correlated. In general, in addition to semantic soundness, the major benefit of RDF is interoperability and standardization, particularly for enterprises, as we will discuss below.

Microformats offer a simpler approach by adding semantics to existing HTML documents using specific CSS styles. The metadata is compact and is embedded inside the actual HTML. Popular microformats are hCard, which describes personal and company contact information, hReview, which adds meta information to review pages, and hCalendar, which is used to describe events.

Microformats are gaining popularity because of their simplicity, but they are still quite limiting. There is no way to described type hierarchies, which the classic semantic community would say is critical. The other issue is that microformats are somewhat cryptic, because the focus is to keep the annotations to a minimum. This, in turn, brings up another question of whether embedding metadata into the view (HTML) is a good idea. The question is: what happens if the underlying data changes when someone makes a copy of the HTML document? Nevertheless, despite these issues, microformats are gaining popularity because they are simple. Microformats are currently used by Flickr, Eventful, and LinkedIn; and many other companies are looking to adopt microformats, particularly because of the recent Yahoo! announcement.

An even simpler approach is to put meta data into the meta headers. This approach has been around for a while and it is a shame that it has not been widely adopted. As an example, the New York Times recently launched extended annotations for its news pages. The benefit of this approach is that it works great for pages that are focused on a topic or a thing. For example, a news page can be described with a set of keywords, geo location, date, time, people, and categories. Another example would be for book pages. O’Reilly.com has been putting book information into the meta headers, describing the author, ISBN, and category of the book.

Despite the fact that all these approaches are different, they are also somewhat complimentary; and each of them is helpful. The more annotations there are in web pages, the more standards are implemented, and the more discoverable and powerful the information becomes.

3. Consumer and Enterprise

Yet another dimension of the conversation about the Semantic Web is the focus on consumer and enterprise applications. In the consumer arena we have been looking for a Killer App – something that delivers tangible and simple consumer value. People simply do not care that a product is built on the Semantic Web, all they are looking for is utility and usefulness.

Up until recently, the challenge has been that the Semantic Web is focused on rather academic issues – like annotating information to make it machine readable. The promise was that once the information is annotated and the web becomes one big giant RDF database, then exciting consumer applications will come. The skeptics, however, have been pointing out that first there needs to be a compelling use case.

Some consumer applications based on the Semantic Web: generic and vertical search, contextual shortcuts and previews, personal information management systems, semantic browsing tools. All of these applications are in their early days and have a long way to go before being truly compelling for the average web user. Still, even if these applications succeed, consumers will not be interested in knowing about the underlying technology – so there is really no marketing play for the Semantic Web in the consumer space.

Enterprises are a different story for a couple of reasons. First, enterprises are much more used to techno speak. To them utilizing semantic technologies translates into being intelligent and that, in turn, is good marketing. ‘Our products are better and smarter because we use the Semantic Web’ sounds like a good value proposition for the enterprise.

But even above the marketing speak, RDF solves a problem of data interoperability and standards. This “Tower of Babel” situation has been in existence since the early days of software. Forget semantics; just a standard protocol, a standard way to pass around information between two programs, is hugely valuable in the enterprise.

RDF offers a way to communicate using XML-based language, which on top of it has sound mathematical elements to enable semantics. This sounds great, and even the complexity of RDF is not going to stop enterprises from using it. However, there is another problem that might stop it – scalability. Unlike relational databases, which have been around for ages and have been optimized and tuned, XML-based databases are still not widespread. In general, the problem is in the scale and querying capabilities. Like object-oriented database technologies of the late nineties, XML-based databases hold a lot of promise, but we are yet to see them in action in a big way.

4. Semantic APIs

With the rise of Semantic Web applications, we are also seeing the rise of Semantic APIs. In general, these web services take as an input unstructured information and find entities and relationships. One way to think of these services is mini natural language processing tools, which are only concerned with a subset of the language.

The first example is the Open Calais API from Reuters that we have covered in two articles here and here. This service accepts raw text and returns information about people, places, and companies found in the document. The output not only returns the list of found matches, but also specifies places in the document where the information is found. Behind Calais is a powerful natural language processing technology developed by Clear Forest (now owned by Reuters), which relies on algorithms and databases to extract entities out of text. According to Reuters, Calais is extensible, and it is just a matter of time before new entities will be added.

Another example is the SemanticHacker API from TextWise, which is offering a one million dollar prize for the best commercial semantic web application developed on top of it. This API classifies information in documents into categories called semantic signatures. Given a document, it outputs entities or topics that the document is about. It is kind of like Calais, but also delivers a topical hierarchy, where the actual objects are leafs.

Another semantic API is offered by Dapper – a web service which facilitates the extraction of structure from unstructured HTML pages. Dapper works by enabling users to define attributes of an object based on the bits of the page. For example, a book publisher might define where the information about author, isbn and number of pages is on a typical book page and the Dapper application would then create a recognizer for any page on the publisher site and enable access to it via REST API.

While this seems backwards from an engineering point of view, Dapper’s technology is remarkably useful in the real world. In a typical scenario, for web sites that do not have clean APIs to access their information, even non-technical people can build an API in minutes with Dapper. This is a powerful way of quickly turning web sites into web services.

5. Search Technologies

Perhaps the first significant blow to the Semantic Web has been the inability thus far to improve search. The premise that semantical understanding of pages leads to vastly better search has yet to be validated. The two main contenders, Hakia and PowerSet, have made some progress, but not enough. The problem is that Google’s algorithm, which is based on statistical analysis, deals just fine with semantic entities like people, cities, and companies. When asked What is the capital of France? Google returns a good enough answer.

There is a growing realization that marginal improvement in search might not be enough to beat Google, and to declare search the killer app for the Semantic Web. Likely, understanding semantics is helpful but not sufficient to build a better search engine. A combination of semantics, innovative presentation, and memory of who the user is, will be necessary to power the next generation search experience.

Alternative approaches also attempt to overlay semantics on top of the search results. Even Google ventures into verticals by partitioning the results into different categories. The consumer can then decide which type of answer they are interested in.

Yet search is a game that is far from won and a lot of semantic companies are really trying to raise the bar. There may be another twist to the whole search play – contextual technologies, as well as semantic databases, could lead to qualitatively better results. And so we turn to these next.

6. Contextual Technologies

We are seeing an increasing number of contextual tools entering the consumer market. Contextual navigation does not just improve search, but rather shortcuts it. Applications like Snap or Yahoo! Shortcuts or SmartLinks “understand” the objects inside text and links and bring relevant information right into the user’s context. The result is that the user does not need to search at all.

Thinking about this more deeply, one realizes that contextual tools leverage semantics in a much more interesting way. Instead of trying to parse what a user types into the search box, contextual technologies rely on analyzing the content. So the meaning is derived in a much more precise way – or rather, there is less guessing. The contextual tools then offer the users relevant choices, each of which leads to a correct result. This is fundamentally different from trying to pull the right results from a myriad of possible choices resulting from a web search.

We are also seeing an increasing number of contextual technologies make their way into the browser. Top-down semantic technologies need to work without publishers doing anything; and so to infer context, contextual technologies integrate into the browser. Firefox’s recommended extensions page features a number of contextual browsing solutions – Interclue, ThumbStrips, Cooliris, and BlueOrganizer (from my own company).

The common theme among these tools is the recognition of information and the creation of specific micro contexts for the users to interact with that information.

7. Semantic Databases

Semantic databases are another breed of semantic applications focused on annotating web information to be more structured. Twine, a product of Radar Networks and currently in private beta, focuses on building a personal knowledge base. Twine works by absorbing unstructured content in various forms and building a personal database of people, companies, things, locations, etc. The content is sent to Twine via bookmarklet or via email or manually. The technology needs to evolve more, but one can see how such databases can be useful once the kinks are worked out. One of the very powerful applications that could be built on top of Twine, for example, is personalized search – a way to filter the results of any search engine based on a particular individual.

It is worth noting that Radar Networks has spent a lot of time getting the infrastructure right. The underlying representation is RDF and is ready to be consumed by other semantic web services. But a big chunk of the core algorithms, the ones that are dealing with entity extraction, are being commoditized by Semantic Web APIs. Reuters offers this as an API call, for example, and so moving forward, Twine won’t need to be concerned with how to do that.

Another big player in the semantic databases space is a company called Metaweb, which created Freebase. In its present form, Freebase is just a fancier and more structured version of Wikipedia – with RDF inside and less information in total. The overall goal of Freebase, however, is to build a Wikipedia equivalent of the world’s information. Such a database would be enormously powerful because it could be queried exactly – much like relational databases. So once again the promise is to build much better search.

But the problem is, how can Freebase keep up with the world? Google indexes the Internet daily and grows together with the web. Freebase currently allows editing of information by individuals and has bootstrapped by taking in parts of Wikipedia and other databases, but in order to scale this approach, it needs to perfect the art of continuously taking in unstructured information from the world, parsing it, and updating its database.

The problem of keeping up with the world is common to all database approaches, which are effectively silos. In the case of Twine, there needs to be continuous influx of user data, and in the case of Freebase there needs to be influx of data from the web. These problems are far from trivial and need to be solved successfully in order for the databases to be useful.

Conclusion

With any new technology it is important to define and classify things. The Semantic Web is offering an exciting promise: improved information discoverability, automation of complex searches, and innovative web browsing. Yet the Semantic Web means different things to different people. Indeed, its definition in the enterprise and consumer spaces is different, and there are different means to a common end – top-down vs. bottom up and microformats vs. RDF. In addition to these patterns, we are observing the rise of semantic APIs and contextual browsing tools. All of these are in their early days, but hold a big promise to fundamentally change the way we interact with information on the web.

What do you think about Semantic Web Patterns? What trends are you seeing and which applications are you waiting for? And if you work with semantic technologies in the enterprise, please share your experiences with us in the comments below.

Read Full Post »

2009 Predictions and Recommendations for Web 2.0 and Social Networks

Christopher Rollyson

Volatility, Uncertainly and Opportunity—Move Crisply while Competitors Are in Disarray

Now that the Year in Review 2008 has summarized key trends, we are in excellent position for 2009 prognostications, so welcome to Part II. As all experienced executives know, risk and reward are inseparable twins, and periods of disruption elevate both, so you will have much more opportunity to produce uncommon value than normal.

This is a high-stakes year in which we can expect surprises. Web 2.0 and social networks can help because they increase flexibility and adaptiveness. Alas, those who succeed will have to challenge conventional thinking considerably, which is not a trivial exercise in normal times. The volatility that many businesses face will make it more difficult because many of their clients and/or employees will be distracted. It will also make it easier because some of them will perceive that extensive change is afoot, and Web 2.0 will blend in with the cacaphony. Disruption produces unusual changes in markets, and the people that perceive the new patterns and react appropriately emerge as new leaders.

2009 Predictions

These are too diverse to be ranked in any particular order. Please share your reactions and contribute those that I have missed.

  1. The global financial crisis will continue to add significant uncertainty in the global economy in 2009 and probably beyond. I have no scientific basis for this, but there are excellent experts of every flavor on the subject, so take your pick. I believe that we are off the map, and anyone who says that he’s sure of a certain outcome should be considered with a healthy skepticism.
    • All I can say is my friends, clients and sources in investment and commercial banking tell me it’s not over yet, and uncertainty is the only certainty until further notice. This has not yet been fully leeched.
    • Western governments, led the the U.S., are probably prolonging the pain because governments usually get bailouts wrong. However, voters don’t have the stomachs for hardship, so we are probably trading short-term “feel good” efforts for a prolonged adjustment period.
  2. Widespread social media success stories in 2009 in the most easily measurable areas such as talent management, business development, R&D and marketing.
    • 2008 saw a significant increase in enterprise executives’ experimentation with LinkedIn, Facebook, YouTube and enterprise (internal) social networks. These will begin to bear fruit in 2009, after which a “mad rush of adoption” will ensue.
    • People who delay adoption will pay dearly in terms of consulting fees, delayed staff training and retarded results.
  3. Internal social networks will largely disappoint. Similar to intranets, they will produce value, but few enterprises are viable long-term without seamlessly engaging the burgeoning external world of experts.
    In general, the larger and more disparate an organization’s audience
    is, the more value it can create, but culture must encourage emergent, cross-boundary connections, which is where many organizations fall down.

 

  • If you’re a CIO who’s banking heavily on your behind-the-firewall implementation, just be aware that you need to engage externally as well.
  • Do it fast because education takes longer than you think.
  • There are always more smart people outside than inside any organization.
  • Significant consolidation among white label social network vendors, so use your usual customary caution when signing up partners.
    • Due diligence and skill portability will help you to mitigate risks. Any vendor worth their salt will use standardized SOA-friendly architecture and feature sets. As I wrote last year, Web 2.0 is not your father’s software, so focus on people and process more than technology.
    • If your vendor hopeful imposes process on your people, run.
  • No extensive M&A among big branded sites like Facebook, LinkedIn and Twitter although there will probably be some. The concept of the social ecosystem holds that nodes on pervasive networks can add value individually. LinkedIn and Facebook have completely different social contexts. “Traditional” executives tend to view disruptions as “the new thing” that they want to put into a bucket (”let them all buy each other, so I only have to learn one!”). Wrong. This is the new human nervous system, and online social venues, like their offline counterparts, want specificity because they add more value that way. People hack together the networks to which they belong based on their goals and interests.
    • LinkedIn is very focused on the executive environment, and they will not buy Facebook or Twitter. They might buy a smaller company. They are focused on building an executive collaboration platform, and a large acquisition would threaten their focus. LinkedIn is in the initial part of its value curve, they have significant cash, and they’re profitable. Their VCs can smell big money down the road, so they won’t sell this year.
    • Twitter already turned down Facebook, and my conversations with them lead me to believe that they love their company; and its value is largely undiscovered as of yet. They will hold out as long as they can.
    • Facebook has staying power past 2009. They don’t need to buy anyone of import; they are gaining global market share at a fast clip. They already enable customers to build a large part of the Facebook experience, and they have significant room to innovate. Yes, there is a backlash in some quarters against their size. I don’t know Mark Zuckerberg personally, and I don’t have a feeling for his personal goals.
    • I was sad to see that Dow Jones sold out to NewsCorp and, as a long-time Wall Street Journal subscriber, I am even more dismayed now. This will prove a quintessential example of value destruction. The Financial Times currently fields a much better offering. The WSJ is beginning to look like MySpace! As for MySpace itself, I don’t have a firm bead on it but surmise that it has a higher probability of major M&A than the aforementioned: its growth has stalled, Facebook continues to gain, and Facebook uses more Web 2.0 processes, so I believe it will surpass MySpace in terms of global audience.
    • In being completely dominant, Google is the Wal-Mart of Web 2.0, and I don’t have much visibility into their plans, but I think they could make significant waves in 2009. They are very focused on applying search innovation to video, which is still in the initial stages of adoption, so YouTube is not going anywhere.
    • I am less familiar with Digg, Xing, Bebo, Cyworld. Of course, Orkut is part of the Googleverse.
  • Significant social media use by the Obama Administration. It has the knowledge, experience and support base to pursue fairly radical change. Moreover, the degree of change will be in synch with the economy: if there is a significant worsening, expect the government to engage people to do uncommon things.
    • Change.gov is the first phase in which supporters or any interested person is invited to contribute thoughts, stories and documents to the transition team. It aims to keep people engaged and to serve the government on a volunteer basis
    • The old way of doing things was to hand out form letters that you would mail to your representative. Using Web 2.0, people can organize almost instantly, and results are visible in real-time. Since people are increasingly online somewhere, the Administration will invite them from within their favorite venue (MySpace, Facebook…).
    • Obama has learned that volunteering provides people with a sense of meaning and importance. Many volunteers become evangelists.
  • Increasing citizen activism against companies and agencies, a disquieting prospect but one that I would not omit from your scenario planning (ask yourself, “How could people come together and magnify some of our blemishes?” more here). To whit:
    • In 2007, an electronic petition opposing pay-per-use road tolls in the UK reached 1.8 million signatories, stalling a major government initiative. Although this did not primarily employ social media, it is indicative of the phenomenon.
    • In Q4 2008, numerous citizen groups organized Facebook groups (25,000 signatures in a very short time) to oppose television and radio taxes, alarming the Swiss government. Citizens are organizing to stop paying obligatory taxes—and to abolish the agency that administers the tax system. Another citizen initiative recently launched on the Internet collected 60,000 signatures to oppose biometric passports. German links. French links.
    • In the most audacious case, Ahmed Maher is using Facebook to try to topple the government of Egypt. According to Wired’s Cairo Activists Use Facebook to Rattle Regime, activists have organized several large demonstrations and have a Facebook group of 70,000 that’s growing fast.
  • Executive employment will continue to feel pressure, and job searches will get increasingly difficult for many, especially those with “traditional” jobs that depend on Industrial Economy organization.
    • In tandem with this, there will be more opportunities for people who can “free-agent” themselves in some form.
    • In 2009, an increasing portion of executives will have success at using social networks to diminish their business development costs, and their lead will subsequently accelerate the leeching of enterprises’ best and brightest, many of whom could have more flexibility and better pay as independents. This is already manifest as displaced executives choose never to go back.
    • The enterprise will continue to unbundle. I have covered this extensively on the Transourcing website.
  • Enterprise clients will start asking for “strategy” to synchronize social media initiatives. Web 2.0 is following the classic adoption pattern: thus far, most enterprises have been using a skunk works approach to their social media initiatives, or they’ve been paying their agencies to learn while delivering services.
    • In the next phase, beginning in 2009, CMOs, CTOs and CIOs will sponsor enterprise level initiatives, which will kick off executive learning and begin enterprise development of social media native skills. After 1-2 years of this, social media will be spearheaded by VPs and directors.
    • Professional services firms (PwC, KPMG, Deloitte..) will begin scrambling to pull together advisory practices after several of their clients ask for strategy help. These firms’ high costs do not permit them to build significantly ahead of demand.
    • Marketing and ad agencies (Leo Burnett, Digitas…) will also be asked for strategy help, but they will be hampered by their desires to maintain the outsourced model; social media is not marketing, even though it will displace certain types of marketing.
    • Strategy houses (McKinsey, BCG, Booz Allen…) will also be confronted by clients asking for social media strategy; their issue will be that it is difficult to quantify, and the implementation piece is not in their comfort zone, reducing revenue per client.
    • Boutiques will emerge to develop seamless strategy and implementation for social networks. This is needed because Web 2.0 and social networks programs involve strategy, but implementation involves little technology when compared to Web 1.0. As I’ll discuss in an imminent article, it will involve much more interpersonal mentoring and program development.
  • Corporate spending on Enterprise 2.0 will be very conservative, and pureplay and white label vendors (and consultants) will need to have strong business cases.
    • CIOs have better things to spend money on, and they are usually reacting to business unit executives who are still getting their arms around the value of Web 2.0, social networks and social media.
    • Enterprise software vendors will release significant Web 2.0 bolt-on improvements to their platforms in 2009. IBM is arguably out in front with Lotus Connections, with Microsoft Sharepoint fielding a solid solution. SAP and Oracle will field more robust solutions this year.
  • The financial crunch will accelerate social network adoption among those focused on substance rather than flash; this is akin to the dotbomb from 2001-2004, no one wanted to do the Web as an end in itself anymore; it flushed out the fluffy offers (and well as some really good ones).
    • Social media can save money.. how much did it cost the Obama campaign in time and money to raise $500 million? Extremely little.
    • People like to get involved and contribute, when you can frame the activity as important and you provide the tools to facilitate meaningful action. Engagement raises profits and can decrease costs. Engaged customers, for example, tend to leave less often than apathetic customers.
    • Social media is usually about engaging volunteer contributors; if you get it right, you will get a lot of help for little cash outlay.
    • Social media presents many new possibilities for revenue, but to see them, look outside existing product silos. Focus on customer experience by engaging customers, not with your organization, but with each other. Customer-customer communication is excellent for learning about experience.
  • Microblogging will completely mainstream even though Twitter is still quite emergent and few solid business cases exist.
    • Twitter (also Plurk, Jaiku, Pownce {just bought by Six Apart and closed}, Kwippy, Tumblr) are unique for two reasons: they incorporate mobility seamlessly, and they chunk communications small; this leads to a great diversity of “usage context”
    • Note that Dell sold $1 million on Twitter in 2008, using it as a channel for existing business.
    • In many businesses, customers will begin expecting your organization to be on Twitter; this year it will rapidly cease to be a novelty.

    2009 Recommendations

    Web 2.0 will affect business and culture far more than Web 1.0 (the internet), which was about real-time information access and transactions via a standards-based network and interface. Web 2.0 enables real-time knowledge and relationships, so it will profoundly affect most organizations’ stakeholders (clients, customers, regulators, employees, directors, investors, the public…). It will change how all types of buying decisions are made.

    As an individual and/or an organization leader, you have the opportunity to adopt more quickly than your peers and increase your relevance to stakeholders as their Web 2.0 expectations of you increase. 2009 will be a year of significant adoption, and I have kept this list short, general and actionable. I have assumed that your organization has been experimenting with various aspects of Web 2.0, that some people have moderate experience. Please feel free to contact me if you would like more specific or advanced information or suggestions. Recommendations are ranked in importance, the most critical at the top.

    1. What: Audit your organization’s Web 2.0 ecosystem, and conduct your readiness assessment. Why: Do this to act with purpose, mature your efforts past experimentation and increase your returns on investment.
      • The ecosystem audit will tell you what stakeholders are doing, and in what venues. Moreover, a good one will tell you trends, not just numbers. In times of rapid adoption, knowing trends is critical, so you can predict the future. Here’s more about audits.
      • The readiness assessment will help you to understand how your value proposition and resources align with creating and maintaining online relationships. The audit has told you what stakeholders are doing, now you need to assess what you can do to engage them on an ongoing basis. Here’s more about readiness assessments.
    2. What: Select a top executive to lead your organization’s adoption of Web 2.0 and social networks. Why: Web 2.0 is changing how people interact, and your organizational competence will be affected considerably, so applying it to your career and business is very important.
      • This CxO should be someone with a track record for innovation and a commitment to leading discontinuous change. Should be philosophically in synch with the idea of emergent organization and cross-boundary collaboration.
      • S/He will coordinate your creation of strategy and programs (part-time). This includes formalizing your Web 2.0 policy, legal and security due diligence.
    3. What: Use an iterative portfolio approach to pursue social media initiatives in several areas of your business, and chunk investments small.
      Why: Both iteration and portfolio approaches help you to manage risk and increase returns.
    • Use the results of the audit and the readiness assessment to help you to select the stakeholders you want to engage.
    • Engage a critical mass of stakeholders about things that inspire or irritate them and that you can help them with.
    • All else equal, pilots should include several types of Web 2.0 venues and modes like blogs, big branded networks (Facebook, MySpace), microblogs (Twitter), video and audio.
    • As a general rule, extensive opportunity exists where you can use social media to cross boundaries, which usually impose high costs and prevent collaboration. One of the most interesting in 2009 will be encouraging alumni, employees and recruits to connect and collaborate according to their specific business interests. This can significantly reduce your organization’s business development, sales and talent acquisition costs. For more insight to this, see Alumni 2.0.
    • Don’t overlook pilots with multiple returns, like profile management programs, which can reduce your talent acquisition and business development costs. Here’s more on profile management.

     

  • What: Create a Web 2.0 community with numerous roles to enable employees flexibility.
    Why: You want to keep investments small and let the most motivated employees step forward.

    • Roles should include volunteers for pilots, mentors (resident bloggers, video producers and others), community builders (rapidly codify the knowledge you are gathering from pilots), some part-time more formal roles. Perhaps a full-time person to coordinate would make sense. Roles can be progressive and intermittent. Think of this as open source.
    • To stimulate involvement, the program must be meaningful, and it must be structured to minimize conflicts with other responsibilities.
  • What: Avoid the proclivity to treat Web 2.0 as a technology initiative. Why: Web 1.0 (the Internet) involved more of IT than does Web 2.0, and many people are conditioned to think that IT drives innovation; they fall in the tech trap, select tools first and impose process. This is old school and unnecessary because the tools are far more flexible than the last generation software with which many are still familiar.
    • People create the value when they get involved, and technology often gets in the way by making investments in tools that impose process on people and turn them off. Web 2.0 tools impose far less process on people.
    • More important than what brand you invest in is your focus on social network processes and how they add value to existing business processes. If you adopt smartly, you will be able to transfer assets and processes elsewhere while minimizing disruption. More likely is that some brands will disappear (Pownce closed its doors 15 December). When you focus your organization on mastering process and you distribute learning, you will be more flexible with the tools.
    • Focus on process and people, and incent people to gather and share knowledge and help each other. This will increase your flexibility with tools.
  • What: Manage consulting, marketing and technology partners with a portfolio strategy. Why: Maximize flexibility and minimize risk.
    • From the technology point of view, there are three main vendor flavors: enterprise bolt-on (i.e. Lotus Connections), pureplay white label vendors (SmallWorldLabs) and open (Facebook, LinkedIn). As a group, pureplays have the most diversity in terms of business models, and the most uncertainty. Enterprise bolt-ons’ biggest risk is that they lag significantly behind. More comparisons here.
    • Fight the urge to go with one. If you’re serious about getting business value, you need to be in the open cross-boundary networks. If you have a Lotus or Microsoft relationship, compare Connections and Sharepoint with some pureplays to address private social network needs. An excellent way to start could be with Yammer.
    • Be careful when working with consulting- and marketing-oriented partners who are accustomed to an outsourced model. Web 2.0 is not marketing; it is communicating to form relationships and collaborate online. It does have extensive marketing applications; make sure partners have demonstrated processes for mentoring because Web 2.0 will be a core capability for knowledge-based organizations, and you need to build your resident knowledge.
  • Parting Shots

    I hope you find these thoughts useful, and I encourage you to add your insights and reactions as comments. If you have additional questions about how to use Web 2.0, please feel free to contact me. I wish all the best to you in 2009.

    Read Full Post »

    Evolving Trends

    Wikipedia 3.0: The End of Google?

    In Uncategorized on June 26, 2006 at 5:18 am

    Author: Marc Fawzi

    License: Attribution-NonCommercial-ShareAlike 3.0

    Announcements:

    Semantic Web Developers:

    Feb 5, ‘07: The following external reference concerns the use of rule-based inference engines and ontologies in implementing the Semantic Web + AI vision (aka Web 3.0):

    1. Description Logic Programs: Combining Logic Programs with Description Logic (note: there are better, simpler ways of achieving the same purpose.)

    Click here for more info and a list of related articles…

    Forward

    Two years after I published this article it has received over 200,000 hits and we now have several startups attempting to apply Semantic Web technology to Wikipedia and knowledge wikis in general, including Wikipedia founder’s own commercial startup as well as a startup that was recently purchased by Microsoft.

    Recently, after seeing how Wikipedia’s governance is so flawed, I decided to write about a way to decentralize and democratize Wikipedia.

    Versión española

    Article

    (Article was last updated at 10:15am EST, July 3, 2006)

    Wikipedia 3.0: The End of Google?

     

    The Semantic Web (or Web 3.0) promises to “organize the world’s information” in a dramatically more logical way than Google can ever achieve with their current engine design. This is specially true from the point of view of machine comprehension as opposed to human comprehension.The Semantic Web requires the use of a declarative ontological language like OWL to produce domain-specific ontologies that machines can use to reason about information and make new conclusions, not simply match keywords.

    However, the Semantic Web, which is still in a development phase where researchers are trying to define the best and most usable design models, would require the participation of thousands of knowledgeable people over time to produce those domain-specific ontologies necessary for its functioning.

    Machines (or machine-based reasoning, aka AI software or ‘info agents’) would then be able to use those laboriously –but not entirely manually– constructed ontologies to build a view (or formal model) of how the individual terms within the information relate to each other. Those relationships can be thought of as the axioms (assumed starting truths), which together with the rules governing the inference process both enable as well as constrain the interpretation (and well-formed use) of those terms by the info agents to reason new conclusions based on existing information, i.e. to think. In other words, theorems (formal deductive propositions that are provable based on the axioms and the rules of inference) may be generated by the software, thus allowing formal deductive reasoning at the machine level. And given that an ontology, as described here, is a statement of Logic Theory, two or more independent info agents processing the same domain-specific ontology will be able to collaborate and deduce an answer to a query, without being driven by the same software.

    Thus, and as stated, in the Semantic Web individual machine-based agents (or a collaborating group of agents) will be able to understand and use information by translating concepts and deducing new information rather than just matching keywords.

    Once machines can understand and use information, using a standard ontology language, the world will never be the same. It will be possible to have an info agent (or many info agents) among your virtual AI-enhanced workforce each having access to different domain specific comprehension space and all communicating with each other to build a collective consciousness.

    You’ll be able to ask your info agent or agents to find you the nearest restaurant that serves Italian cuisine, even if the restaurant nearest you advertises itself as a Pizza joint as opposed to an Italian restaurant. But that is just a very simple example of the deductive reasoning machines will be able to perform on information they have.

    Far more awesome implications can be seen when you consider that every area of human knowledge will be automatically within the comprehension space of your info agents. That is because each info agent can communicate with other info agents who are specialized in different domains of knowledge to produce a collective consciousness (using the Borg metaphor) that encompasses all human knowledge. The collective “mind” of those agents-as-the-Borg will be the Ultimate Answer Machine, easily displacing Google from this position, which it does not truly fulfill.

    The problem with the Semantic Web, besides that researchers are still debating which design and implementation of the ontology language model (and associated technologies) is the best and most usable, is that it would take thousands or tens of thousands of knowledgeable people many years to boil down human knowledge to domain specific ontologies.

    However, if we were at some point to take the Wikipedia community and give them the right tools and standards to work with (whether existing or to be developed in the future), which would make it possible for reasonably skilled individuals to help reduce human knowledge to domain-specific ontologies, then that time can be shortened to just a few years, and possibly to as little as two years.

    The emergence of a Wikipedia 3.0 (as in Web 3.0, aka Semantic Web) that is built on the Semantic Web model will herald the end of Google as the Ultimate Answer Machine. It will be replaced with “WikiMind” which will not be a mere search engine like Google is but a true Global Brain: a powerful pan-domain inference engine, with a vast set of ontologies (a la Wikipedia 3.0) covering all domains of human knowledge, that can reason and deduce answers instead of just throwing raw information at you using the outdated concept of a search engine.

    Notes

    After writing the original post I found out that a modified version of the Wikipedia application, known as “Semantic” MediaWiki has already been used to implement ontologies. The name that they’ve chosen is Ontoworld. I think WikiMind would have been a cooler name, but I like ontoworld, too, as in “it descended onto the world,” since that may be seen as a reference to the global mind a Semantic-Web-enabled version of Wikipedia could lead to.

    Google’s search engine technology, which provides almost all of their revenue, could be made obsolete in the near future. That is unless they have access to Ontoworld or some such pan-domain semantic knowledge repository such that they tap into their ontologies and add inference capability to Google search to build formal deductive intelligence into Google.

    But so can Ask.com and MSN and Yahoo…

    I would really love to see more competition in this arena, not to see Google or any one company establish a huge lead over others.

    The question, to rephrase in Churchillian terms, is wether the combination of the Semantic Web and Wikipedia signals the beginning of the end for Google or the end of the beginning. Obviously, with tens of billions of dollars at stake in investors’ money, I would think that it is the latter. No one wants to see Google fail. There’s too much vested interest. However, I do want to see somebody out maneuver them (which can be done in my opinion.)

    Clarification

    Please note that Ontoworld, which currently implements the ontologies, is based on the “Wikipedia” application (also known as MediaWiki), but it is not the same as Wikipedia.org.

    Likewise, I expect Wikipedia.org will use their volunteer workforce to reduce the sum of human knowledge that has been entered into their database to domain-specific ontologies for the Semantic Web (aka Web 3.0) Hence, “Wikipedia 3.0.”

    Response to Readers’ Comments

    The argument I’ve made here is that Wikipedia has the volunteer resources to produce the needed Semantic Web ontologies for the domains of knowledge that it currently covers, while Google does not have those volunteer resources, which will make it reliant on Wikipedia.

    Those ontologies together with all the information on the Web, can be accessed by Google and others but Wikipedia will be in charge of the ontologies for the large set of knowledge domains they currently cover, and that is where I see the power shift.

    Google and other companies do not have the resources in man power (i.e. the thousands of volunteers Wikipedia has) who would help create those ontologies for the large set of knowledge domains that Wikipedia covers. Wikipedia does, and is positioned to do that better and more effectively than anyone else. Its hard to see how Google would be able create the ontologies for all domains of human knowledge (which are continuously growing in size and number) given how much work that would require. Wikipedia can cover more ground faster with their massive, dedicated force of knowledgeable volunteers.

    I believe that the party that will control the creation of the ontologies (i.e. Wikipedia) for the largest number of domains of human knowledge, and not the organization that simply accesses those ontologies (i.e. Google), will have a competitive advantage.

    There are many knowledge domains that Wikipedia does not cover. Google will have the edge there but only if people and organizations that produce the information also produce the ontologies on their own, so that Google can access them from its future Semantic Web engine. My belief is that it would happen but very slowly, and that Wikipedia can have the ontologies done for all the domain of knowledge that it currently covers much faster, and then they would have leverage by the fact that they would be in charge of those ontologies (aka the basic layer for AI enablement.)

    It still remains unclear, of course, whether the combination of Wikipedia and the Semantic Web herald the beginning of the end for Google or the end of the beginning. As I said in the original part of the post, I believe that it is the latter, and the question I pose in the title of this post, in this context, is not more than rhetorical. However, I could be wrong in my judgment and Google could fall behind Wikipedia as the world’s ultimate answer machine.

    After all, Wikipedia makes “us” count. Google doesn’t. Wikipedia derives its power from “us.” Google derives its power from its technology and inflated stock price. Who would you count on to change the world?

    Response to Basic Questions Raised by the Readers

    Reader divotdave asked a few questions, which I thought to be very basic in nature (i.e. important.) I believe more people will be pondering about the same issues, so I’m to including here them with the replies.

    Question:
    How does it distinguish between good information and bad? How does it determine which parts of the sum of human knowledge to accept and which to reject?

    Reply:
    It wouldn’t have to distinguish between good vs bad information (not to be confused with well-formed vs badly formed) if it was to use a reliable source of information (with associated, reliable ontologies.) That is if the information or knowledge to be sought can be derived from Wikipedia 3.0 then it assumes that the information is reliable.

    However, with respect to connecting the dots when it comes to returning information or deducing answers from the sea of information that lies beyond Wikipedia then your question becomes very relevant. How would it distinguish good information from bad information so that it can produce good knowledge (aka comprehended information, aka new information produced through deductive reasoning based on exiting information.)

    Question:
    Who, or what as the case may be, will determine what information is irrelevant to me as the inquiring end user?

    Reply:
    That is a good question and one which would have to be answered by the researchers working on AI engines for Web 3.0

    There will be assumptions made as to what you are inquiring about. Just as when I saw your question I had to make assumption about what you really meant to ask me, AI engines would have to make an assumption, pretty much based on the same cognitive process humans use, which is the topic of a separate post, but which has been covered by many AI researchers.

    Question:
    Is this to say that ultimately some over-arching standard will emerge that all humanity will be forced (by lack of alternative information) to conform to?

    Reply:
    There is no need for one standard, except when it comes to the language the ontologies are written in (e.g OWL, OWL-DL, OWL Full etc.) Semantic Web researchers are trying to determine the best and most usable choice, taking into consideration human and machine performance in constructing –and exclusive in the latter case– interpreting those ontologies.

    Two or more info agents working with the same domain-specific ontology but having different software (different AI engines) can collaborate with each other.

    The only standard required is that of the ontology language and associated production tools.

    Addendum

    On AI and Natural Language Processing

    I believe that the first generation of AI that will be used by Web 3.0 (aka Semantic Web) will be based on relatively simple inference engines that will NOT attempt to perform natural language processing, where current approaches still face too many serious challenges. However, they will still have the formal deductive reasoning capabilities described earlier in this article, and users would interact with these systems through some query language.

    On the Debate about the Nature and Definition of AI

    The embedding of AI into cyberspace will be done at first with relatively simple inference engines (that use algorithms and heuristics) that work collaboratively in P2P fashion and use standardized ontologies. The massively parallel interactions between the hundreds of millions of AI Agents that will run within the millions of P2P AI Engines on users’ PCs will give rise to the very complex behavior that is the future global brain.

    Related:

    1. Web 3.0 Update
    2. All About Web 3.0 <– list of all Web 3.0 articles on this site
    3. P2P 3.0: The People’s Google
    4. Reality as a Service (RaaS): The Case for GWorld <– 3D Web + Semantic Web + AI
    5. For Great Justice, Take Off Every Digg
    6. Google vs Web 3.0
    7. People-Hosted “P2P” Version of Wikipedia
    8. Beyond Google: The Road to a P2P Economy


    Update on how the Wikipedia 3.0 vision is spreading:


    Update on how Google is co-opting the Wikipedia 3.0 vision:



    Web 3D Fans:

    Here is the original Web 3D + Semantic Web + AI article:

    Web 3D + Semantic Web + AI *

    The above mentioned Web 3D + Semantic Web + AI vision which preceded the Wikipedia 3.0 vision received much less attention because it was not presented in a controversial manner. This fact was noted as the biggest flaw of social bookmarking site digg which was used to promote this article.

    Web 3.0 Developers:

    Feb 5, ‘07: The following external reference concerns the use of rule-based inference engines and ontologies in implementing the Semantic Web + AI vision (aka Web 3.0):

    1. Description Logic Programs: Combining Logic Programs with Description Logic (note: there are better, simpler ways of achieving the same purpose.)

    Jan 7, ‘07: The following Evolving Trends post discusses the current state of semantic search engines and ways to improve the paradigm:

    1. Designing a Better Web 3.0 Search Engine

    June 27, ‘06: Semantic MediaWiki project, enabling the insertion of semantic annotations (or metadata) into the content:

    1. http://semantic-mediawiki.org/wiki/Semantic_MediaWiki (see note on Wikia below)

    Wikipedia’s Founder and Web 3.0

    (more…)

    Read Full Post »

    Evolving Trends

    Google Warming Up to the Wikipedia 3.0 vision?

    In Uncategorized on December 14, 2007 at 8:09 pm

    [source: slashdot.org]

    Google’s “Knol” Reinvents Wikipedia

    Posted by CmdrTaco on Friday December 14, @08:31AM
    from the only-a-matter-of-time dept.

     

    teslatug writes “Google appears to be reinventing Wikipedia with their new product that they call knol (not yet publicly available). In an attempt to gather human knowledge, Google will accept articles from users who will be credited with the article by name. If they want, they can allow ads to appear alongside the content and they will be getting a share of the profits if that’s the case. Other users will be allowed to rate, edit or comment on the articles. The content does not have to be exclusive to Google but no mention is made on any license for it. Is this a better model for free information gathering?”

    This article Wikipedia 3.0: The End of Google?  which gives you an idea why Google would want its own Wikipedia was on the Google Finance page for at least 3 months when anyone looked up the Google stock symbol, so Google employees, investors and executive must have seen it. 

    Is it a coincidence that Google is building its own Wikipedia now?

    The only problem is a flaw in Google’s thinking. People who author those articles on Wikipedia actually have brains. People with brains tend to have principles. Getting paid pennies to build the Google empire is rarely one of those principles.

    Related

    Read Full Post »

    Older Posts »

    %d bloggers like this: