Feeds:
Posts
Comments

Archive for the ‘Microsoft’ Category

Business & TechAdd Time News
My Yahoo!
My Google
Netvibes
My AOL
RSS Feed
See all feeds SEARCH TIME.COM Full Archive
Covers
INSIDE: Main | Global Business | Small Business | Curious Capitalist | Nerd World | Money & Main Street | Videos
google_04011
As Google’s Growth Falters, Microsoft Could Regain Momentum
By 24/7 Wall St. Wednesday, Apr. 01, 2009People sit under a Google logoitted.
JOHN MACDOUGALL/AFP/Getty
Print
Email Share
Digg
Facebook Yahoo! BuzzTwitter Linkedin Permalink Reprints Related Most of the recent news about Google (GOOG) has been bad. Online advertising posted a slow fourth quarter. That unexpectedly included both display ads and search marketing which has made Google one of the fastest growing large companies in America. Several Wall St. analysts have commented that Google’s search revenue’s rate of increase flattened out in January and February. Since the consensus among experts who cover the company is that revenue will rise 11% in the first quarter, a flat quarter would be devastating.

Related
China’s Dream of Big GDP Growth is Disappearing
Why GE’s new Global “Theme” is an Excuse to Sell Medical Equipment
Internet Advertising: Bad in Q1, Possibly Worse in Q2
More Related
Google: The Economy in a Tea Cup
Catching Up to Stay Ahead
Google Gets Friendly
One of the things that Wall St. hates about Google is that it does one thing better than any other company in the world, but that is all it does. Google Chrome browser, Google Earth, Google Maps, and YouTube have really made much money. Some of the features have not produced any revenue at all. If its search operation falters, Google’s run as the hottest tech company in the world could be over. (See pictures of Google Earth.)

At this point, Google is a $22 billion company. If the search business drops to a growth rate of 10% a year, it will take three years for Google’s sales to get to $30 billion. From the time Microsoft (MSFT) hit $22 billion in sales in 2000, it took the company less than three years to get to the $30 billion plateau. Then from 2002 to 2008, Microsoft’s sales doubled. The software business not only grew. Until recently, it grew quickly. (See pictures of Bill Gates.)

The assumption about Google’s prospects is that the search company is the next Microsoft. Twenty years ago, Microsoft had the hot hand. Sales of Windows and the company’s business and server software were stunning. The margins on some of Microsoft’s software franchises were over 70%. Then the hyper-growth stopped as the company’s market penetration of PCs and servers reached a saturation point. Microsoft’s stock never saw the level it hit in 2000 again. Without lucrative stock options, employees who wanted to make it rich moved to start-ups. The people who had been at the company thirty years were already rich. Many of them retired.

About seven years after Microsoft’s stock hit an all-time high, Google traded at $747, its peak. It now changes hands at $348, and if the company’s sales can only grow at 10% or 15%, the stock is not going back above $700, ever. The myth about companies like Microsoft and Google is that what they do is so important to business and consumers and so pervasive that the growth curve never flattens out. It does flatten at every company. No exceptions.

The press coverage of Google this week included a few pathetic announcements. Disney (DIS) will put some of its premium content on Google’s YouTube. That should be good for $10 million in revenue a year. Google is starting a $100 million venture capital arm which will make it the 1,000th largest venture operation in the world. In other words, it will not be managing enough venture money to matter. Then word came out that Hewlett-Packard (HPQ) might use Google’s operating system in some of its netbooks instead of Microsoft Windows. The important word in that report is “might.” The news that Google is adding thousands of employees a quarter and that the founders have bought a 747 or an aircraft carrier probably hit a high point two years ago.

Saying that Google is doing poorly is not the same as saying that Microsoft is doing well. What matters to Microsoft is that Google becomes less of a threat each day as it fails in its diversification attempts. Google’s cash flow does not continue to give it an almost limitless capital arsenal. Google has to consider cutting people in areas which will never be profitable. The entire ethos at Google is in the process of changing. Microsoft may be in third place in the search business, but it is in first place in software, which is still the larger industry.

Investors still ask Microsoft why it is in the video game business. There is not any reasonable answer. It is an awful business with poor margins. It has nothing to do with selling Windows. There may have been some idea that being in the hardware business would help the software business, but, if so, that idea didn’t work out a long time ago.

With the perceived playing field that Microsoft and Google operate on a bit more level now, they can race after the one market that could be substantial for either one or both of them, which is providing software and search on mobile devices. The smartphone, which is really a PC for the pocket, is part of the one-billion-units-per-year-in-sales handset industry. Providing the operating software and other key components for wireless devices is almost certainly the next big thing for tech companies from Google to Yahoo (YHOO) to Microsoft to Adobe (ADBE). Trying to milk more money out of the PC gets harder and harder. For the largest companies in the industry, it has become a zero sum game. (See pictures of the 50 best websites of 2008.)

For Google and Microsoft, the best days are over, unless one can dominate the handset world the way it did the universe of computers.

— Douglas A. McIntyre
01

Read Full Post »


Report: Semantic Web Companies Are, or Will Soon Begin, Making Money

Written by Marshall Kirkpatrick / October 3, 2008 5:13 PM / 14 Comments


provostpic-1.jpgSemantic Web entrepreneur David Provost has published a report about the state of business in the Semantic Web and it’s a good read for anyone interested in the sector. It’s titled On the Cusp: A Global Review of the Semantic Web Industry. We also mentioned it in our post Where Are All The RDF-based Semantic Web Apps?.

The Semantic Web is a collection of technologies that makes the meaning of content online understandable by machines. After surveying 17 Semantic Web companies, Provost concludes that Semantic science is being productized, differentiated, invested in by mainstream players and increasingly sought after in the business world.

Provost aims to use real-world examples to articulate the value proposition of the Semantic Web in accessible, non-technical language. That there are enough examples available for him to do this is great. His conclusions don’t always seem as well supported by his evidence as he’d like – but the profiles he writes of 17 Semantic Web companies are very interesting to read.

What are these companies doing? Provost writes:

“..some companies are beginning to focus on specific uses of Semantic technology to create solutions in areas like knowledge management, risk management, content management and more. This is a key development in the Semantic Web industry because until fairly recently, most vendors simply sold development tools.”

 

The report surveys companies ranging from the innovative but unlaunched Anzo for Excel from Cambridge Semantics, to well-known big players like Down Jones Client Solutions and RWW sponsor Reuters Calais Initiative, to relatively unknown big players like the already very commercialized Expert System. 10 of the companies were from the US, 6 from Europe and 1 from South Korea.

semwebchart.jpgAbove: Chart from Provost’s report.We’ve been wanting to learn more about “under the radar” but commercialized semantic web companies ever since doing a briefing with Expert System a few months ago. We had never heard of the Italian company before, but they believe they already have they have a richer, deeper semantic index than anyone else online. They told us their database at the time contained 350k English words and 2.8m relationships between them. including geographic representations. They power Microsoft’s spell checker and the Natural Language Processing (NLP) in the Blackberry. They also sell NLP software to the US military and Department of Homeland Security, which didn’t seem like anything to brag about to us but presumably makes up a significant part of the $12 million+ in revenue they told Provost they made last year.

And some people say the Semantic Web only exists inside the laboratories of Web 3.0 eggheads!

Shortcomings of the Report

Provost writes that “the vendors [in] this report have all the appearances of thriving, emerging technology companies and they have shown their readiness to cross borders, continents, and oceans to reach customers.” You’d think they turned water into wine. Those are strong words for a study in which only 4 of 17 companies were willing to report their revenue and several hadn’t launched products yet.

The logic here is sometimes pretty amazing.

The above examples [there were two discussed – RWW] are just a brief sampling of the commercial success that the Semantic Web has been experiencing. In broad terms, it’s easy to point out the longevity of many companies in this industry and use that as a proxy for commercial success [wow – RWW]. With more time (and space in this report), additional examples could be described but the most interesting prospect pertains to what the industry landscape will look like in twelve months. [hmmm…-RWW]

 

In fact, while Provost has glowingly positive things to about all the companies he surveyed, the absence of engagement with any of their shortcomings makes the report read more like marketing material than any objective take on what’s supposed to be world-changing technology.

This is a Fun Read

The fact is, though, that Provost writes a great introduction to many companies working to sell software in a field still too widely believed to be ephemeral. The stories of each of the 17 companies profiled are fun to read and many of Provost’s points of analysis are both intuitive and thought provoking.

He says the sector is “on the cusp” of major penetration into existing markets currently served by non-semantic software. Provost argues that the Semantic Web struggles to explain itself because the World Wide Web is so intensely visual and semantics are not. He says that reselling business partners in specific distribution channels are combining their domain knowledge with the science of the software developers to bring these tools to market. He tells a great, if unattributed, story about what Linked Data could mean to the banking industry.

We hadn’t heard of several of the companies profiled in the report, and a handful of them had never been mentioned by the 34 semantic web specialist blogs we track, either.

There’s something here for everyone. You can read the full report here.

Read Full Post »


The Semantic Desktop? SDS Brings Semantics To Excel

Written by Sarah Perez / August 13, 2008 6:30 AM / 6 Comments


When you hear the word “semantic” you likely think of the semantic web – the supposed next iteration of the World Wide Web that features structured data and specific protocols that aim to bring about an “intelligent” web. But the concept of semantics doesn’t necessarily apply just to the web – it can apply to other things as well, like your desktop…or even your Excel spreadsheets, according to Ian Goldsmid, founder of Semantic Business Intelligence, whose new app, SDS, brings a semantic system to spreadsheets.

Semantic Spreadsheets

The problem with spreadsheets that their system is trying to address has to do with those who need to derive data from multiple spreadsheets (two or more). Although it’s easy enough to perform sorts, build macros, and create formulas within one spreadsheet, when needing to compare values in multiple spreadsheets the process becomes more difficult.

The company’s app, The Semantic Discovery System for Excel, or just SDS for short, will look for similar columns or rows between the sheets and then “semantically” connects them. They don’t appear to just be throwing that term around either – the app uses the same W3C Semantic Web technologies (RDF, OWL, SPARQL) to help you capture “meaning, intelligence, and knowledge” from the data saved in your spreadsheets.

Do We Need Semantic Desktop Apps?

Does SDS solve a business problem that is not yet being addressed through current technologies? In my experience, the short answer to this question is “no.” (But wait, there’s more…)

Typically, when a business has need of comparing and analyzing large amounts of data, the solution is to turn to a database product that can then be queried and from which custom reports can be pulled. And a business doesn’t need to spend a lot of money on a robust solution to do so – even a smaller business can create a database by using inexpensive desktop software.

However, the difference between using a database technology and “semantically connecting” some spreadsheets comes down to for whom this product is being built. In the past, databases and other business intelligence apps were built as if the creators knew that the only person using them would be an I.T. guy or gal. SDS, instead, aims to satisfy the needs of the non-technical end user.

Is this another example of tech populism at work? It certainly looks like it. Yet, in this case their market is small – a non-technical user who’s also a power user with Excel? There’s usually some overlap there. Not to mention, by the time you’ve achieved “power user” status, you’ve often also figured out how to do more complicated things in Excel…like, say, formulas that work across spreadsheets, for example – the very pain points this app is trying to address.

Still, it’s an interesting concept to think of taking the semantic web capabilities and integrating them into everyday programs to add a layer of intelligence to these programs as well. Done correctly, it could improve the capabilities of our favorite software apps without making the programs overly complex, which is what typically happens when you add more features.

What do you think? Is the Semantic Desktop (that is, semantically-enabled desktop apps) right around the corner? Or is this product and those like it too niche to find an audience? Let us know what you think in the comments.

Read Full Post »

Google: “We’re Not Doing a Good Job with Structured Data”

Written by Sarah Perez / February 2, 2009 7:32 AM / 9 Comments


During a talk at the New England Database Day conference at the Massachusetts Institute of Technology, Google’s Alon Halevy admitted that the search giant has “not been doing a good job” presenting the structured data found on the web to its users. By “structured data,” Halevy was referring to the databases of the “deep web” – those internet resources that sit behind forms and site-specific search boxes, unable to be indexed through passive means.

Google’s Deep Web Search

Halevy, who heads the “Deep Web” search initiative at Google, described the “Shallow Web” as containing about 5 million web pages while the “Deep Web” is estimated to be 500 times the size. This hidden web is currently being indexed in part by Google’s automated systems that submit queries to various databases, retrieving the content found for indexing. In addition to that aspect of the Deep Web – dubbed “vertical searching” – Halevy also referenced two other types of Deep Web Search: semantic search and product search.

Google wants to also be able to retrieve the data found in structured tables on the web, said Halevy, citing a table on a page listing the U.S. presidents as an example. There are 14 billion such tables on the web, and, after filtering, about 154 million of them are interesting enough to be worth indexing.

Can Google Dig into the Deep Web?

The question that remains is whether or not Google’s current search engine technology is going to be adept at doing all the different types of Deep Web indexing or if they will need to come up with something new. As of now, Google uses the Big Table database and MapReduce framework for everything search related, notes Alex Esterkin, Chief Architect at Infobright, Inc., a company delivering open source data warehousing solutions. During the talk, Halevy listed a number of analytical database application challenges that Google is currently dealing with: schema auto-complete, synonym discovery, creating entity lists, association between instances and aspects, and data level synonyms discovery. These challenges are addressed by Infobright’s technology, said Esterkin, but “Google will have to solve these problems the hard way.”

Also mentioned during the speech was how Google plans to organize “aspects” of search queries. The company wants to be able to separate exploratory queries (e.g., “Vietnam travel”) from ones where a user is in search of a particular fact (“Vietnam population”). The former query should deliver information about visa requirements, weather and tour packages, etc. In a way, this is like what the search service offered by Kosmix is doing. But Google wants to go further, said Halevy. “Kosmix will give you an ‘aspect,’ but it’s attached to an information source. In our case, all the aspects might be just Web search results, but we’d organize them differently.”

Yahoo Working on Similar Structured Data Retrieval

The challenges facing Google today are also being addressed by their nearest competitor in search, Yahoo. In December, Yahoo announced that they were taking their SearchMonkey technology in-house to automate the extraction of structured information from large classes of web sites. The results of that in-house extraction technique will allow Yahoo to augment their Yahoo Search results with key information returned alongside the URLs.

In this aspect of web search, it’s clear that no single company has yet to dominate. However, even if a non-Google company surges ahead, it may not be enough to get people to switch engines. Today, “Google” has become synonymous with web search, just like “Kleenex” is a tissue, “Band-Aid” is an adhesive bandage, and “Xerox” is a way to make photocopies. Once that psychological mark has been made into our collective psyches and the habit formed, people tend to stick with what they know, regardless of who does it better. That’s something that’s a bit troublesome – if better search technology for indexing the Deep Web comes into existence outside of Google, the world may not end up using it until such point Google either duplicates or acquires the invention.

Still, it’s far too soon to write Google off yet. They clearly have a lead when it comes to search and that came from hard work, incredibly smart people, and innovative technical achievements. No doubt they can figure out this Deep Web thing, too. (We hope).

Read Full Post »

Yahoo to Enable Custom Semantic Search Engines

Written by Marshall Kirkpatrick / February 11, 2009 9:14 AM / 2 Comments


Yahoo is bringing together two of its most interesting projects today, Yahoo BOSS (Build Your Own Search Service) and SearchMonkey, its semantic indexing and search result enhancement service. There were a number of different parts of the announcement – but the core of the story is simple.

Developers will now be able to build their own search engines using the Yahoo! index and search processing infrastructure via BOSS and include the semantic markup added to pages in both results parsing and the display of those results. There’s considerable potential here for some really dazzling results.

We wrote about the genesis of Search Monkey here this Spring, it’s an incredibly ambitious project. The end result of it is rich search results, where additional dynamic data from marked up fields can also be displayed on the search results page itself. So searching for a movie will show not just web pages associated with that movie, but additional details from those pages, like movie ratings, stars, etc. There’s all kinds of possibilities for all kinds of data.

Is anyone using Yahoo! BOSS yet? Anyone who will be able to leverage Search Monkey for a better experience right away? Yahoo is encouraging developers to tag their projects bossmashup in Delicious. As you can see for yourself, there are a number of interesting proofs of concept there but not a whole lot of products. Of the products that are there, very few seem terribly compelling to us so far.

We must admit that the most compelling BOSS implementation so far is over at the site of our competitors TechCrunch. Their new blog network search implementation of BOSS is beautiful – you can see easily, for example, that TechCrunch network blogs have used the word ReadWriteWeb 7 times in the last 6 months. (In case you were wondering.)

Speaking of TechCrunch, that site’s Mark Hendrickson covered the Yahoo BOSS/Search Monkey announcement today as well, and having worked closely on the implementation there he’s got an interesting perspective on it. He points out that the new pricing model, free up to 10,000 queries a day, will likely only impact a handful of big sites – not BOSS add-ons like TechCrunch search or smaller projects.

The other interesting part of the announcement is that BOSS developers will now be allowed to use 3rd party ads on their pages leveraging BOSS – not just Yahoo adds. That’s hopeful.

Can Yahoo do it? Can these two projects brought together lead to awesome search mashups all over the web? We’ve had very high hopes in the past. Now the proof will be in the pudding.

Read Full Post »

Semantic Web Patterns: A Guide to Semantic Technologies

Written by Alex Iskold / March 25, 2008 3:20 PM / 32 Comments

 


In this article, we’ll analyze the trends and technologies that power the Semantic Web. We’ll identify patterns that are beginning to emerge, classify the different trends, and peak into what the future holds.

In a recent interview Tim Berners-Lee pointed out that the infrastructure to power the Semantic Web is already here. ReadWriteWeb’s founder, Richard MacManus, even picked it to be the number one trend in 2008. And rightly so. Not only are the bits of infrastructure now in place, but we are also seeing startups and larger corporations working hard to deliver end user value on top of this sophisticated set of technologies.

The Semantic Web means many things to different people, because there are a lot of pieces to it. To some, the Semantic Web is the web of data, where information is represented in RDF and OWL. Some people replace RDF with Microformats. Others think that the Semantic Web is about web services, while for many it is about artificial intelligence – computer programs solving complex optimization problems that are out of our reach. And business people always redefine the problem in terms of end user value, saying that whatever it is, it needs to have simple and tangible applications for consumers and enterprises.

The disagreement is not accidental, because the technology and concepts are broad. Much is possible and much is to be imagined.

1. Bottom-Up and Top-Down

We have written a lot about the different approaches to the Semantic Web – the classic bottom-up approach and the new top-down one. The bottom-up approach is focused on annotating information in pages, using RDF, so that it is machine readable. The top-down approach is focused on leveraging information in existing web pages, as-is, to derive meaning automatically. Both approaches are making good progress.

A big win for the bottom-up approach was recent announcement from Yahoo! that their search engine is going to support RDF and microformats. This is a win-win-win for publishers, for Yahoo!, and for customers – publishers now have an incentive to annotate information because Yahoo! Search will be taking advantage of it, and users will then see better, more precise results.

Another recent win for the bottom-up approach was the announcement of the Semantify web service from Dapper (previous coverage). This offering will enable publishers to add semantic annotations to existing web pages. The more tools like Semantify that pop up, the easier it will be for publishers to annotate pages. Automatic annotation tools combined with the incentive to annotate the pages is going to make the bottom-up approach more compelling.

But even if the tools and incentive exists, to make the bottom-up approach widespread is difficult. Today, the magic of Google is that it can understand information as is, without asking people to fully comply with W3C standards of SEO optimization techniques. Similarly, top-down semantic tools are focused on dealing with imperfections in existing information. Among them are the natural language processing tools that do entity extraction – such as the Calais and TextWise APIs that recognize people, companies, places, etc. in documents; vertical search engines, like ZoomInfo and Spock, which mine the web for people; technologies like Dapper and BlueOrganizer, which recognize objects in web pages; and Yahoo! Shortcuts, Snap and SmartLinks, which recognize objects in text and links.

[Disclosure:] Alex Iskold is founder and CEO of AdaptiveBlue, which makes BlueOrganizer and SmartLinks.

Top-down technologies are racing forward despite imperfect information. And, of course, they benefit from the bottom-up annotations as well. The more annotations there are, the more precise top-down technologies will get – because they will be able to take advantage of structured information as well.

2. Annotation Technologies: RDF, Microformats, and Meta Headers

Within the bottom-up approach to annotation of data, there are several choices for annotation. They are not equally powerful, and in fact each approach is a tradeoff between simplicity and completeness. The most comprehensive approach is RDF – a powerful, graph-based language for declaring things, and attributes and relationships between things. In a simplistic way, one can think of RDF as the language that allows expressing truths like: Alex IS human (type expression), Alex HAS a brain (attribute expression), and Alex IS the father of Alice, Lilly, and Sofia (relationship expression). RDF is powerful, but because it is highly recursive, precise, and mathematically sound, it is also complex.

At present, most use of RDF is for interoperability. For example, the medical community uses RDF to describe genomic databases. Because the information is normalized, the databases that were previously silos can now be queried together and correlated. In general, in addition to semantic soundness, the major benefit of RDF is interoperability and standardization, particularly for enterprises, as we will discuss below.

Microformats offer a simpler approach by adding semantics to existing HTML documents using specific CSS styles. The metadata is compact and is embedded inside the actual HTML. Popular microformats are hCard, which describes personal and company contact information, hReview, which adds meta information to review pages, and hCalendar, which is used to describe events.

Microformats are gaining popularity because of their simplicity, but they are still quite limiting. There is no way to described type hierarchies, which the classic semantic community would say is critical. The other issue is that microformats are somewhat cryptic, because the focus is to keep the annotations to a minimum. This, in turn, brings up another question of whether embedding metadata into the view (HTML) is a good idea. The question is: what happens if the underlying data changes when someone makes a copy of the HTML document? Nevertheless, despite these issues, microformats are gaining popularity because they are simple. Microformats are currently used by Flickr, Eventful, and LinkedIn; and many other companies are looking to adopt microformats, particularly because of the recent Yahoo! announcement.

An even simpler approach is to put meta data into the meta headers. This approach has been around for a while and it is a shame that it has not been widely adopted. As an example, the New York Times recently launched extended annotations for its news pages. The benefit of this approach is that it works great for pages that are focused on a topic or a thing. For example, a news page can be described with a set of keywords, geo location, date, time, people, and categories. Another example would be for book pages. O’Reilly.com has been putting book information into the meta headers, describing the author, ISBN, and category of the book.

Despite the fact that all these approaches are different, they are also somewhat complimentary; and each of them is helpful. The more annotations there are in web pages, the more standards are implemented, and the more discoverable and powerful the information becomes.

3. Consumer and Enterprise

Yet another dimension of the conversation about the Semantic Web is the focus on consumer and enterprise applications. In the consumer arena we have been looking for a Killer App – something that delivers tangible and simple consumer value. People simply do not care that a product is built on the Semantic Web, all they are looking for is utility and usefulness.

Up until recently, the challenge has been that the Semantic Web is focused on rather academic issues – like annotating information to make it machine readable. The promise was that once the information is annotated and the web becomes one big giant RDF database, then exciting consumer applications will come. The skeptics, however, have been pointing out that first there needs to be a compelling use case.

Some consumer applications based on the Semantic Web: generic and vertical search, contextual shortcuts and previews, personal information management systems, semantic browsing tools. All of these applications are in their early days and have a long way to go before being truly compelling for the average web user. Still, even if these applications succeed, consumers will not be interested in knowing about the underlying technology – so there is really no marketing play for the Semantic Web in the consumer space.

Enterprises are a different story for a couple of reasons. First, enterprises are much more used to techno speak. To them utilizing semantic technologies translates into being intelligent and that, in turn, is good marketing. ‘Our products are better and smarter because we use the Semantic Web’ sounds like a good value proposition for the enterprise.

But even above the marketing speak, RDF solves a problem of data interoperability and standards. This “Tower of Babel” situation has been in existence since the early days of software. Forget semantics; just a standard protocol, a standard way to pass around information between two programs, is hugely valuable in the enterprise.

RDF offers a way to communicate using XML-based language, which on top of it has sound mathematical elements to enable semantics. This sounds great, and even the complexity of RDF is not going to stop enterprises from using it. However, there is another problem that might stop it – scalability. Unlike relational databases, which have been around for ages and have been optimized and tuned, XML-based databases are still not widespread. In general, the problem is in the scale and querying capabilities. Like object-oriented database technologies of the late nineties, XML-based databases hold a lot of promise, but we are yet to see them in action in a big way.

4. Semantic APIs

With the rise of Semantic Web applications, we are also seeing the rise of Semantic APIs. In general, these web services take as an input unstructured information and find entities and relationships. One way to think of these services is mini natural language processing tools, which are only concerned with a subset of the language.

The first example is the Open Calais API from Reuters that we have covered in two articles here and here. This service accepts raw text and returns information about people, places, and companies found in the document. The output not only returns the list of found matches, but also specifies places in the document where the information is found. Behind Calais is a powerful natural language processing technology developed by Clear Forest (now owned by Reuters), which relies on algorithms and databases to extract entities out of text. According to Reuters, Calais is extensible, and it is just a matter of time before new entities will be added.

Another example is the SemanticHacker API from TextWise, which is offering a one million dollar prize for the best commercial semantic web application developed on top of it. This API classifies information in documents into categories called semantic signatures. Given a document, it outputs entities or topics that the document is about. It is kind of like Calais, but also delivers a topical hierarchy, where the actual objects are leafs.

Another semantic API is offered by Dapper – a web service which facilitates the extraction of structure from unstructured HTML pages. Dapper works by enabling users to define attributes of an object based on the bits of the page. For example, a book publisher might define where the information about author, isbn and number of pages is on a typical book page and the Dapper application would then create a recognizer for any page on the publisher site and enable access to it via REST API.

While this seems backwards from an engineering point of view, Dapper’s technology is remarkably useful in the real world. In a typical scenario, for web sites that do not have clean APIs to access their information, even non-technical people can build an API in minutes with Dapper. This is a powerful way of quickly turning web sites into web services.

5. Search Technologies

Perhaps the first significant blow to the Semantic Web has been the inability thus far to improve search. The premise that semantical understanding of pages leads to vastly better search has yet to be validated. The two main contenders, Hakia and PowerSet, have made some progress, but not enough. The problem is that Google’s algorithm, which is based on statistical analysis, deals just fine with semantic entities like people, cities, and companies. When asked What is the capital of France? Google returns a good enough answer.

There is a growing realization that marginal improvement in search might not be enough to beat Google, and to declare search the killer app for the Semantic Web. Likely, understanding semantics is helpful but not sufficient to build a better search engine. A combination of semantics, innovative presentation, and memory of who the user is, will be necessary to power the next generation search experience.

Alternative approaches also attempt to overlay semantics on top of the search results. Even Google ventures into verticals by partitioning the results into different categories. The consumer can then decide which type of answer they are interested in.

Yet search is a game that is far from won and a lot of semantic companies are really trying to raise the bar. There may be another twist to the whole search play – contextual technologies, as well as semantic databases, could lead to qualitatively better results. And so we turn to these next.

6. Contextual Technologies

We are seeing an increasing number of contextual tools entering the consumer market. Contextual navigation does not just improve search, but rather shortcuts it. Applications like Snap or Yahoo! Shortcuts or SmartLinks “understand” the objects inside text and links and bring relevant information right into the user’s context. The result is that the user does not need to search at all.

Thinking about this more deeply, one realizes that contextual tools leverage semantics in a much more interesting way. Instead of trying to parse what a user types into the search box, contextual technologies rely on analyzing the content. So the meaning is derived in a much more precise way – or rather, there is less guessing. The contextual tools then offer the users relevant choices, each of which leads to a correct result. This is fundamentally different from trying to pull the right results from a myriad of possible choices resulting from a web search.

We are also seeing an increasing number of contextual technologies make their way into the browser. Top-down semantic technologies need to work without publishers doing anything; and so to infer context, contextual technologies integrate into the browser. Firefox’s recommended extensions page features a number of contextual browsing solutions – Interclue, ThumbStrips, Cooliris, and BlueOrganizer (from my own company).

The common theme among these tools is the recognition of information and the creation of specific micro contexts for the users to interact with that information.

7. Semantic Databases

Semantic databases are another breed of semantic applications focused on annotating web information to be more structured. Twine, a product of Radar Networks and currently in private beta, focuses on building a personal knowledge base. Twine works by absorbing unstructured content in various forms and building a personal database of people, companies, things, locations, etc. The content is sent to Twine via bookmarklet or via email or manually. The technology needs to evolve more, but one can see how such databases can be useful once the kinks are worked out. One of the very powerful applications that could be built on top of Twine, for example, is personalized search – a way to filter the results of any search engine based on a particular individual.

It is worth noting that Radar Networks has spent a lot of time getting the infrastructure right. The underlying representation is RDF and is ready to be consumed by other semantic web services. But a big chunk of the core algorithms, the ones that are dealing with entity extraction, are being commoditized by Semantic Web APIs. Reuters offers this as an API call, for example, and so moving forward, Twine won’t need to be concerned with how to do that.

Another big player in the semantic databases space is a company called Metaweb, which created Freebase. In its present form, Freebase is just a fancier and more structured version of Wikipedia – with RDF inside and less information in total. The overall goal of Freebase, however, is to build a Wikipedia equivalent of the world’s information. Such a database would be enormously powerful because it could be queried exactly – much like relational databases. So once again the promise is to build much better search.

But the problem is, how can Freebase keep up with the world? Google indexes the Internet daily and grows together with the web. Freebase currently allows editing of information by individuals and has bootstrapped by taking in parts of Wikipedia and other databases, but in order to scale this approach, it needs to perfect the art of continuously taking in unstructured information from the world, parsing it, and updating its database.

The problem of keeping up with the world is common to all database approaches, which are effectively silos. In the case of Twine, there needs to be continuous influx of user data, and in the case of Freebase there needs to be influx of data from the web. These problems are far from trivial and need to be solved successfully in order for the databases to be useful.

Conclusion

With any new technology it is important to define and classify things. The Semantic Web is offering an exciting promise: improved information discoverability, automation of complex searches, and innovative web browsing. Yet the Semantic Web means different things to different people. Indeed, its definition in the enterprise and consumer spaces is different, and there are different means to a common end – top-down vs. bottom up and microformats vs. RDF. In addition to these patterns, we are observing the rise of semantic APIs and contextual browsing tools. All of these are in their early days, but hold a big promise to fundamentally change the way we interact with information on the web.

What do you think about Semantic Web Patterns? What trends are you seeing and which applications are you waiting for? And if you work with semantic technologies in the enterprise, please share your experiences with us in the comments below.

Read Full Post »

Older Posts »

%d bloggers like this: