Archive for April 7th, 2009

There is No Web 3.0, There is No Web 2.0 – There is Just the Web

Written by Josh Catone / April 24, 2008 4:57 PM / 36 Comments

Something struck me while listening to Tim O’Reilly’s keynote speech at the Web 2.0 expo yesterday: glancing at my notes after he walked off stage, I noticed that his current definition for Web 2.0, is a lot like the definition he’s given for Web 3.0. Based on this, plus past comments from O’Reilly that I dug up via a few web searches, I am forced to one conclusion: Tim O’Reilly, the man credited with popularizing the term Web 2.0, doesn’t actually believe it exists. For O’Reilly, there is just the web right now. 1.0, 2.0, 3.0 — it’s all the same ever-changing web.

Let’s first take a look at Tim O’Reilly’s widely used and accepted compact definition for Web 2.0 circa 2006 (way, way back in the dark ages of a year and a half ago):

Web 2.0 is the business revolution in the computer industry caused by the move to the internet as platform, and an attempt to understand the rules for success on that new platform. Chief among those rules is this: Build applications that harness network effects to get better the more people use them. (This is what I’ve elsewhere called “harnessing collective intelligence.”)

We can perhaps simplify that even further: Web 2.0 is the web as a platform and collective intelligence (or, leveraging of user created data). Now let’s look at Tim’s definition of Web 3.0 (which actually predates his last Web 2.0 definition):

Recently, whenever people ask me “What’s Web 3.0?” I’ve been saying that it’s when we apply all the principles we’re learning about aggregating human-generated data and turning it into collective intelligence, and apply that to sensor-generated (machine-generated) data.

Which we can simplify to mean, the leveraging of the things we created in Web 2.0. And here’s the Web 2.0 defintion he had up on a slide yesterday during his keynote:


  • The Internet is the platform
  • Harnessing the collective intelligence
  • Data as the “Intel Inside”
  • Software above the level of a single device
  • Software as a service


O’Reilly talked about Web 2.0 in terms of taking user-generated data and turning it into user facing services. So now we’re starting to see a lot of overlap between the two definitions. He’s also brought in a lot of Web 3.0 definitions that other people have given and used them as part of this broader definition of Web 2.0. For example, Eric Schmidt of Google talked about Web 3.0 in terms of sofware as a service and cloud computing. Our own Alex Iskold talked about Web 3.0 in terms of web sites being turned into platforms. And so on.

“For ‘Web 3.0’ to be meaningful we’ll need to see a serious discontinuity from the previous generation of technology … I find myself particularly irritated by definitions of ‘Web 3.0’ that are basically descriptions of Web 2.0,” Tim O’Reilly once said, which is mildly ironic given that his current Web 2.0 definition basically eclipses his old Web 3.0 definition. But in reality, I think O’Reilly is saying that the versioning doesn’t really matter — the web is the web.

“The points of contrast [between Web 2.0 and Web 3.0] are actually the same points that I used to distinguish Web 2.0 from Web 1.5. (I’ve always said that Web 2.0 = Web 1.0, with the dot com bust being a side trip that got it wrong.),” wrote O’Reilly last fall. In otherw words, the versioning of the web is silly. Web 1.0, 2.0, or 3.0 is all really just whatever cool new thing we’re using the web to accomplish right now.

And he has a point. A couple of days ago, we wrote about the history of the term Web 3.0 and noted that the term itself doesn’t really matter, what matters is the discussions we have when trying to define it. “It is the discussion that is helpful rather than coming to any accepted definition. Some might argue that version numbers are silly on the web, that Web 2.0 and Web 3.0 are just marketing ploys, and that we shouldn’t use terms that are so nebulous and difficult to define. Those are all fair points. But at the same time, the discussions we have about defining the next web help to solidify our vision of where we’re going — and you can’t get there until you decide where you want to go,” we wrote.

Web 2.0 and Web 3.0 — they don’t really exist. They’re just arbitrary numbers assigned to something that doesn’t really have versions. But the discussion that those terms have prompted have been helpful, I think, in figuring out where the web is going and how we’re going to get there; and that’s what is important.

So next time someone asks me what we cover on ReadWriteWeb, maybe I won’t use the term “Web 2.0” in my reply, I’ll just tell them that we write about the web, what you can do with it now, and what you’ll be able to do with it in the future.

Read Full Post »

The Future of the Desktop

Written by Guest Author / August 18, 2008 3:22 PM / 35 Comments

Everything is moving to the cloud. As we enter the third decade of the Web we are seeing an increasing shift from native desktop applications towards Web-hosted clones that run in browsers. For example, a range of products such as Microsoft Office Live, Google Docs, Zoho, ThinkFree, DabbleDB, Basecamp, and many others now provide Web-based alternatives to the full range of familiar desktop office productivity apps. The same is true for an increasing range of enterprise applications, led by companies such as Salesforce.com, and this process seems to be accelerating. In addition, hosted remote storage for individuals and enterprises of all sizes is now widely available and inexpensive. As these trends continue, what will happen to the desktop and where will it live?

This is a guest post by Nova Spivack, founder and CEO of Twine. This is the final version of an article Spivack has been working on in his public Twine.

Is the desktop of the future going to just be a web-hosted version of the same old-fashioned desktop metaphors we have today?

No. There have already been several attempts at copying the old-fashioned “files and folders” desktop interface to the Web, but they have not caught on. Imitations desktops to-date have simply been clunky and slow imitations of the real-thing at best. Others have been overly slick. But one thing they all have in common: None of them have nailed it.  People don’t want to manage all their information on the Web in the same interface they use to manage data and apps on their local PC. The Web is an entirely different medium than the desktop and it requires a new kind of interface. The desktop of the future – what some have called “the Webtop” – still has yet to be invented.

The desktop of the future is going to be a hosted web service

Is the desktop even going to exist anymore as the Web becomes increasingly important? Yes, there has to be some kind of place that we consider to be our personal “home” and “workspace” — but it’s not going to live on any one device.

As we move into a world that is increasingly mobile, where users often work across several different devices in the course of their day, we need unified access to our applications and data. This requires that our applications and data do not reside on local devices anymore, but rather that they will live in the cloud and be accessible via Web services.

The painful process of using synchronization utilities to keep data on our different devices in-synch will finally be a thing of the past. Similarly an entire class of applications for remote-PC access will also become extinct. Instead, all devices will synch with the cloud, where your applications, data and desktop workspace state will live as a unified, hosted service. Your desktop will appear on whatever device you login to, just as you left it wherever you last accessed it. This shift harkens back to previous attempts to revive thin-client computing –  such as Sun Microsystems’ Java Desktop – but this time it is going to actually become mainstream.

The Browser is Going to Swallow Up the Desktop

It’s a classic embrace-and-extend story – the Web browser began as just another app on the desktop and has quickly embraced and extended every other application to become the central tool on everyone’s desktop. All that remains is the desktop itself – and the browser is quickly making inroads there as well. In particular Firefox, with it’s easy extensibility and huge range of add-ons, is rapidly displacing the remaining features of the desktop.

If these trends continue, will the browser eventually swallow up or simply replace the desktop? Yes. In fact, it will probably happen very soon. There just isn’t any reason to have a desktop outside the browser anymore. What we think of as “the desktop” is really just a perspective on our information and applications – it’s really just another “page” or context in our digital lives. This could easily exist within a browser. So instead of launching the browser from the desktop, it makes more sense to launch the desktop from the browser. In this way of thinking, the desktop is really just our home page – the place where we do our work and keep up with our world.

The focus of the desktop will shift from information to attention

As our digital lives evolve out of the old-fashioned desktop into the browser-centric Web environment we will see a shift from organizing information spatially (directories, folders, desktops, etc.) to organizing information temporally (feeds, lifestreams, microblogs, timelines, etc.). The Web is constantly changing and the biggest challenge is not finding information, it is keeping up with it.

The desktop of the future is going to be more concerned with helping users manage information overload – particularly the overload caused by change. In this respect, it is going to feel more like an RSS feed reader or a social news site than a directory. The focus will be on helping the user to manage and keep up with all the stuff flowing in and out of the their environment. The interface will be tuned to help the user understand what the trends are, rather than just on how things are organized.

Users are going to shift from acting as librarians to acting as daytraders.

As we move into an era where content creation and distribution become almost infinitely cheap, the scarcest resources will no longer be storage or bandwidth, it will be attention. The pace of information creation and distribution continues to accelerate and there is no end in sight, yet the cognitive capabilities of the individual human brain are finite and we are already at our limits.

In order to cope with the overwhelming complexity of our digital lives, we are going to increasingly rely on tools that help us manage our attention more productively — rather than tools that simply help us manage our information.

It is a shift from the mindset of being librarians to that of being daytraders. In the PC era we were all focused on trying to manage the information on our computers — we were acting as librarians. Filing things was a big hassle, and finding them was just as difficult. But today filing information is really not the problem: Google has made search so powerful and ubiquitous that many Web users don’t bother to file anything anymore – instead they just search again when they need it. The librarian problem has been overcome by the brute force of Web-scale search. At least for now.

Instead we are now struggling to cope with a different problem – the problem of filtering for what is really important or relevant now and in the near-future. With limited time and attention, we have to be careful what we look for and what we pay attention to. This is the mindset of the daytrader. Bet wrong and you could end up wasting your precious resources, bet right and you could find the motherlode before the rest of the world and gain valuable advantages by being first. Daytraders are focused on discovering and keeping track of trends. It’s a very different focus and activity from being a librarian, and it’s what we are all moving towards.

The Webtop will be more social and will leverage and integrate collective intelligence

The Webtop is going to be more socially oriented than desktops of today — it will have built-in messaging and social networking, as well as social-media sharing, collaborative filtering, discussions, and other community features.

The social dimension of our lives is becoming perhaps our most important source of information. We get information via email from friends, family and colleagues. We get information via social networks and social media sharing services. We co-create information with others in communities. And we team up with our communities to filter, rate and redistribute content.

The social dimension is also starting to play a more important role in our information management and discovery activities. Instead of those activities remaining as solitary, they are becoming more communal. For example many social bookmarking and social news sites use community sentiment and collaborative filtering to help to highlight what is most interesting, useful or important. 

Sites such as Digg, Reddit, Mixx, Slashdot, Delicious, StumbleUpon, Twine, and many others, show that collective intelligence may be the most powerful way to help individuals and groups filter content and manage their attention more productively. The power of many trumps the power of one.

The desktop of the future is going to have powerful semantic search and social search capabilities built-in

Our evolving Webtop is going to have more powerful search built-in. It will of course provide best-of-breed keyword search capabilities, but this is just the beginning.

It will also combine social search and semantic search. On the social search dimension, users will be able to search their information and rank it via attributes of their social graph (for example, “find documents about x and rank them by how many of my friends liked them.”)

Semantic search on the other hand will enable more granular search and navigation of information along a potentially open-ended networks of properties and relationships. For example you will be able to search in a highly structured way — for example, search for products you once bookmarked that have a price of $10.95 and are on-sale this week. Or search for documents you read which were authored by Sue and related to project X, in the last month. The semantics of the future desktop will be open-ended. That is to say that users as well as other application and information providers will be able to extend it with custom schemas, new data types, and custom fields to any piece of information.

Interactive shared spaces will replace folders

Forget about shared folders — that is an outmoded paradigm. Instead, the new metaphor will be interactive shared spaces. These shared spaces will be more like wikis than folders. They will be permission-based environments where one or many contributors can meet, interact synchronously or asynchronously, to work on information and other tasks together.

There are many kinds of shared spaces already in existence, including discussion forums, blogs, social network profiles, community sites, file sharing tools, conferencing tools, version control systems, and groupware. But as we move into Web 3.0 these will begin to converge. We will store information in them, we will work on information there, we will publish and distribute information through them, we will search across them, and we will interact with others around them.

Our next-generation shared spaces will be nestable and linkable like folders, but they will be far more powerful and dynamic, and they will be accessible via HTTP and other APIs such as SPARQL enabling data to be moved in and out of them easily by other applications around the Web.

Any group of two or more individuals will be able to participate in a shared space that will appear on their individual desktops, for a particular purpose. These new shared spaces will not only provide richer semantics in the underlying data, social network, and search, but they will also enable groups to seamlessly and collectively add, organize, track, manage, discuss, distribute, and search for information of mutual interest.

The Portable Desktop

The underlying data in the future desktop, and in all associated services it connects, will be represented using open-standard data formats. Not only will the data be open, but the semantics of the data – the schema that defines it – will also be defined in an open way. The value of open linked-data and open semantics is that data will not be held prisoner anywhere: it will be portable and will be easy to integrate with other data. The emerging Semantic Web and Data Portability initiatives provide a good set of open standards for enabling this to happen.

Due to open-standards and data-portability, your desktop and data will be free from “platform lock-in.” This means that your Webtop might even be portable to a different competing Webtop provider someday. If and when that becomes possible, how will Webtop providers compete to add value?

The Smart Desktop

One of the most important aspects of the coming desktop is that it’s going to be smart. It’s going to have to be. Users simply cannot handle the complexity of their information landscapes anymore – they need help. There are a range of tasks that the desktop should automate for users including: organizing information, reminding users when necessary, resolving data conflicts, managing versioning, maintaining data quality, backing up data, prioritizing information, and gathering relevant information and suggesting it when appropriate.

Most other features of the future desktop will be commodities – but intelligence will still be difficult to provide, and so it will be the last remaining frontier in which competing Webtop providers will be able to differentiate their offerings.

The Webtop is going to learn and help you to be more productive. As you use it, it’s going to adjust to your interests, relationships, current activities, information and preferences. It will adaptively self-organize to help you focus your attention on what is most important to whatever context you are in.

When reading something while you are taking a trip to Milan it may organize itself to be more contextually relevant to that time, place and context. When you later return home to San Francisco it will automatically adapt and shift to your home context. When you do a lot of searches about a certain product it will realize your context and intent has to do with that product and will adapt to help you with that activity for a while, until your behavior changes.

Your desktop will actually be a semantic knowledge base on the back-end. It will encode a rich semantic graph of your information, relationships, interests, behavior and preferences. You will be able to permit other applications to access part or all of your graph to datamine it and provide you with value-added views and even automated intelligent assistance.

For example, you might allow an agent that cross-links things to see all your data: it would go and add cross links to relevant things onto all the things you have created or collected. Another agent that makes personalized buying recommendations might only get to see your shopping history across all shopping sites you use.

Your desktop may also function as a simple personal assistant at times. You will be able to converse with your desktop eventually — through a conversational agent interface. While on the road you will be able to email or SMS in questions to it and get back immediate intelligent answers. You will even be able to do this via a voice interface.

For example, you might ask, “where is my next meeting?” or “what Japanese restaurants do I like in LA?” or “What is Sue’s Smith’s phone number?” and you would get back answers. You could also command it to do things for you — like reminding you to do something, or helping you keep track of an interest, or monitoring for something and alerting you when it happens.

Because your future desktop will connect all the relationships in your digital life — relationships connecting people, information, behavior, preferences and applications — it will be the ultimate place to learn about your interests and preferences.

Federated, open policies and permissions

This rich graph of meta-data that comprises your future desktop will enable the next-generation of smart services to learn about you and help you in an incredibly personalized manner. It will also of course be rife with potential for abuse and privacy will be a major function and concern.

One of the biggest enabling technologies that will be necessary is a federated model for sharing meta-data about policies and permissions on data. Information that is considered to be personal and private in Web site X should be recognized and treated as such by other applications and websites you choose to share that information with. This will require a way for sharing meta-data about your policies and permissions between different accounts and applications you use.

The semantic web provides a good infrastructure for building and deploying a decentralized framework for policy and privacy integration, but it has yet to be developed, let alone adopted. For the full vision of the future desktop to emerge a universally accepted standard for exchanging policy and permission data will be a necessary enabling technology.

The personal cloud

One way to think of the emerging Webtop is as your personal cloud. It will not just be a cloud of data, it will be a compute cloud as well. When you need to store or retrieve information it will provide that service. When you need to do computations, it will provide that to you as well. The cost of harnessing the capabilities of your cloud may be based on a monthly subscription or it may be metered, or it may be ad-supported.

Your personal cloud will have a center – provided by your main Webtop provider, where your address will live — but most of its services will be distributed in other places, and even federated among other providers. Yet from an end-user perspective it will function as a seamlessly integrated service. You will be able to see and navigate all your information and applications, as if they were in one connected space, regardless of where they are actually hosted. You will be able to search your personal cloud from any point within it. It will look and feel like a single cohesive service.

The WebOS

No discussion of the future of the desktop would be complete without delving into the topic of the WebOS. The shift from desktop to Webtop – the move from a local desktop to a hosted desktop – is a necessary step towards the entire operating system moving to the Web as well. Many of the services that comprise an operating system are already available as Web services, but they are not yet integrated into a single cohesive WebOS. However it seems clear that the major players are aware of this opportunity and are positioning their services to capture it. Just as the desktop OS wars were won by capturing the “high ground” of the desktop, I would not be surprised if the same principle holds in the battle to own the WebOS. Whomever wins the Webtop will win the whole stack.

Who is most likely to own the future desktop?

When I think about what the future desktop is going to look like it seems to be a convergence of several different kinds of services that we currently view as separate.

It will be hosted on the cloud and accessible across all devices. It will place more emphasis on social interaction, social filtering, and collective intelligence. It will provide a very powerful and extensible data model with support for both unstructured and arbitrarily structured information. It will enable almost peer-to-peer like search federation, yet still have a unified home page and user-experience. It will be smart and personalized. It will be highly decentralized yet will manage identity, policies and permissions in an integrated cohesive and transparent manner across services.

By cobbling together a number of different services that exist today you could build something like this in a decentralized fashion. As various services integrate with each other it may simply emerge on its own. But is that how the desktop of the future will come about? Or will it be provided as a new application from one player – perhaps one with a lot of centralized market power and the ability to launch something like this on a massive scale? Or – just as with the previous desktop hits of the past, will it come from a little-known upstart with a disruptive technology? It’s hard to predict, but one thing is certain: it is going to happen relatively soon and will be an interesting process to watch.

Image via Arnaldo Licea

Read Full Post »

How Loomia Aims to Drive Revenue for Media Websites in 2009

Written by Richard MacManus / March 3, 2009 8:00 AM / 6 Comments

Loomia is a content recommendations service, used on sites such as the Wall Street Journal and PC World. We’ve profiled Loomia’s Facebook app before, which tracks what you and your Facebook friends are reading on Loomia-supported sites and then shows you what content is most popular among your social circle. Loomia has recently started to focus on revenue-driving recommendations for its media clients, as well as getting more active in the video industry. In this post we take a look at what Loomia is focusing on in 2009, which is an indicator of what media websites must do to ramp up this year.

On media websites, Loomia is most commonly seen as a widget that recommends content. For example, in the WSJ screenshot to the right, the contents of this widget are obtained by measuring the popularity of the content, user behavior, data about the content itself (for example its topic). For some of the publishers which use Loomia, there is a social element too.

Loomia is similar to Sphere and another app we reviewed recently, Apture. These services all aim to serve up more clickable content options on media websites – which means more user engagement and time spent on site for publishers.

We spoke to Loomia CEO David Marks and asked him how Loomia compares to Sphere, which at first glance appears to have much in common with Loomia. Marks said that Sphere is trying to do “semantic classification”, i.e. analyzing the content of an article and recommending further content based on the findings. However Loomia focuses more on the user and so it does behavioral type recommendations. This can result in a more diverse set of topics, because users typically have a range of content preferences. It depends on the article though, said Marks.

Loomia currently has 2 types of deployment:

  • Content (e.g. WSJ)
  • Video (e.g. Brightcove)

Marks told ReadWriteWeb that video advertising is currently selling well for big media publishers. Accordingly these publishers typically now want to drive users to their videos – and Loomia has a widget to do that.

Marks told us that a lot of their publishers are “dollar focused” this year, therefore recommendations have become more than just an interesting feature on a website – they can drive more advertising dollars. As an example, Marks told us that a media website’s Finance section may sell out with ads, but its Politics section may not (fairly common in big media websites). But the Politics section tends to get bigger page views, so to address the imbalance Loomia’s recommendations widgets can drive users from Politics to Finance.

We’ve been looking at how recommendations are being used in the retail sector a lot, and Loomia is a neat example of how the same technology can have real value for the media segment. Let us know in the comments what other recommendation technologies have caught your eye in publishing.

Read Full Post »

How to Use the New Google Web Search RSS Feeds

Written by Marshall Kirkpatrick / October 30, 2008 11:31 AM / 7 Comments

Google’s been the lone hold out among major search engines on RSS but the company quietly enabled feeds for web search results this week. The offering is pretty limited and frustrating, you have to go through Google Alerts to get an obscure RSS URL, but we offer a tutorial and some strategic advice in this post.

Web search RSS is useful for being alerted whenever search results for your keywords or link have changed; subscribing to at least a few searches will let you know when Google users are seeing something new in the first few pages of search results for your company name, for example.

How to Get the Feeds

All the other major search engines make it really easy to grab a feed for any web search, but Google is probably concerned about spammers finding bizarre and unscrupulous uses for its feeds. We’re all inconvenienced as a result.

To get a feed for a Google search you have to go to the web page for Google Alerts and set up an alert for your search. You can enter most queries here, including site: queries. (site:http://readwriteweb.com semantic for example.) You should select “web” instead of the default “comprehensive” if you’re just interested in tracking web search results.


“Feed” isn’t an option in the initial drop down menu of delivery options, you’ve got to select email first. After you’ve done that, look at your collection of alerts and click to edit the one you want by RSS. At this point “feed” is an option in the drop down menu. Select it and you’ll be shown an RSS URL. Throw that puppy in your favorite feed reader and you’re ready to rock and roll.

The feed will deliver any new links that show up in the top 20 search results for your query. That’s pretty limited, but most people don’t look beyond the first 20 results anyway. That means that this is good for high-level reputation tracking but not very good for discovery of new, more obscure pages of interest.

The RSS URLs that Google gives you are based on an arbitrary number and don’t contain the text characters of your query. That means you can’t build more feeds by simply editing the URLs, you have to go back in through Alerts and repeat the proccess for every feed of interest.

Update: One day after we wrote this post, the official Google Blog just announced the availability of feed alerts as well.

More Advanced Options

Here’s how we’re using the new Google search feeds. We’ve grabbed feed URLs for searches for A. our names, B. our company name, C. our company URL and (just for fun) one for each of those three items without the other two. For example: “Richard MacManus” -readwriteweb -http://readwriteweb.com.

That gave us a small pile of feeds, which we then ran through our favorite RSS splicing and deduplication service (we used Yahoo Pipes but if you’re not comfortable with Pipes then Feed.informer.com is really easy to use). We spliced all these feeds together, filtered for duplicates and then threw the resulting feed into our highest priority feed reading system.

Pipes_ editing _RWW Google Websearch Tracking_.jpg

Now we can track our high level reputations constantly, without being paranoid about it. We might do this for concept searches as well so that if someone new starts ranking really high for topics we specialize in (semantic web, RSS) then we’ll know about them and never look ignorant at parties.

If we were interested in getting an RSS feed for Google web search for discovery, more than just reputation tracking, we might do an “advanced search,” increase the results displayed from 10 to 100 and then use Dapper.net to scrape a feed of results from that page.

All of this is more complicated than it ought to be, but once you set up even the most basic feed options then you don’t have to think about it again. Though it isn’t perfect, we do appreciate Google making these feeds available.

Read Full Post »

Search War: Yahoo! Opens Its Search Engine to Attack Google With An Army of Verticals

Written by Marshall Kirkpatrick / July 9, 2008 9:00 PM / 15 Comments

BossYahoo! is taking a bold step tonight: opening up its index and search engine to any outside developers who want to incorporate Yahoo! Search’s content and functionality into search engines on their own sites. The company that sees just over 20% of the searches performed each day believes that the new program, called BOSS (Build Your Own Search Service), could create a cadre of small search engines that in aggregate will outstrip their own market share and leave Google with less than 50% of the search market.

It’s an ambitious and exciting idea. It could also become very profitable when Yahoo! later enables the inclusion of Yahoo! search ads on sites using the BOSS APIs. BOSS will include access to Yahoo! web, news and image searches.

Partner Relationships

Websites wishing to leverage the BOSS APIs will be allowed to can blend in their own ranking input and change the presentation of results. There are no requirements for attribution to Yahoo! and there’s no limit on the number of queries that can be performed.

At launch Yahoo! BOSS will see live integrations with at least three other companies. Hakia will integrate their semantic parsing with the Yahoo! index and search, social browser plug-in Me.dium will use the data it’s collected to offer a social search tied to the Yahoo! index, and real-time sentiment search engine Summize was included in the BOSS demo – augmenting Yahoo News search results with related Twitter messages.

More extensive customization and integration with large media companies will be performed with assistance from Yahoo! and ad-free access to the APIs will be made available to the Computer Science departments of academic institutions.

mediumBOSS.jpgMe.dium captures 20m URLs daily and will use BOSS to show social relevance in addition to link-weight in search. 

Does Anyone Really Care About Niche Vertical Search Engines?

We asked Yahoo! just that, although we believe that alternative search engines can be pretty exciting. None the less, we think it’s a valid question.

Senior Director of the Open Search Platform, Bill Michels told us that niche search engines often aren’t very good because they have access to a very limited index of content. It’s expensive to index the whole web. Likewise, Michels said that there are a substantial number of large organizations that have a huge amount of content but don’t have world-class search technology.

In both cases, Yahoo! BOSS is intended to level the playing field and blow the Big 3 wide open. We agree that it’s very exciting to imagine thousands of new Yahoo! powered niche search engines proliferating. Could Yahoo! plus the respective strengths and communities of all these new players challenge Google? We think they could.

Hakia will parse the Yahoo! index for semantic meaning and data type.–>

What’s Not Included?

The BOSS APIs are in beta for now, so they may be expanded with time – but for now there are still a few crown jewels in the company’s plans that won’t be opened up. We asked about Yahoo’s indexing of the semantic web and were told that would not be a part of BOSS. We asked about the Inbox 2.0 strategy and the company’s plans to rewire for social graph and data portability paradigms. We were told that those were “other programs.”

We hope that there’s not a fundamental disconnect there that will lead to lost opportunities and a lack of focus. It is clear, though, that BOSS falls well within the company’s overall technical strategy of openness. When it comes to web standards, openness and support for the ecosystem of innovation – there may be no other major vendor online as strong as Yahoo! is today. These are times of openness, where some believe that no single vendor’s technology and genius alone can match the creativity of an empowered open market of developers. Yahoo! is positioning itself as leader of this movement.

Let’s see what they can do with an army of Yahoo! powered search engines. Let the games begin!

Read Full Post »

Is Google a Semantic Search Engine?

Written by Guest Author / March 26, 2007 1:00 PM / 35 Comments

Written by Phill Midwinter, a search engineer from the UK. This is a great follow-up to our article last Friday, Hakia Takes On Google With Semantic Technologies.

What is a Semantic Engine?

Semantics are said to be ‚Äòthe next big thing‚Äô in search engine technology. We technology bloggers routinely drum up articles about it and sell it to you, the adoring masses, as a product that will change your web experience forever. Problem is, we often forget to tell you exactly what semantics are – we just get so excited. So let’s explore this…

Wikipedia says:

‚ÄúSemantics (Greek semantikos, giving signs, significant, symptomatic, from sema, sign) refers to the aspects of meaning that are expressed in a language, code, or other form of representation. Semantics is contrasted with two other aspects of meaningful expression, namely, syntax, the construction of complex signs from simpler signs, and pragmatics, the practical use of signs by agents or communities of interpretation in particular circumstances and contexts. By the usual convention that calls a study or a theory by the name of its subject matter, semantics may also denote the theoretical study of meaning in systems of signs.‚Ä?

…which is absolutely no help.

Semantics as it relates to our topic, search engines, actually covers a few closely related fields. In this instance what we are looking at deciphering (as a basic example) is whether a computer can discern if there is a link between two words, such as cat and dog. You and I both know that cats and dogs are common household pets, and can be categorized as such. The human brain seems to comprehend this easily, but for a computer it is a much more complex task and one I won‚Äôt go into here – because it would most likely bore you.

If we take as read then, that the search engine now has semantic functionality, how does that enable it to refine its search capability?

  • It can automatically place pages into dynamic categories, or tag them without human intervention. Knowing what topic a page relates to is invaluable for returning relevant results.
  • It can offer related topics and keywords to help you narrow your search successfully. With a keyword like sport the engine would offer you a list of sports perhaps as well as sports related news and blogs.
  • Instead of offering you the related keywords, the engine can directly incorporate them back into the search with less weight than the user inputted ones. It‚Äôs still contested as to whether this will produce better results or just more varied ones.
  • If the engine uses statistical analysis to retrieve it‚Äôs semantic matches to a keyword (as Google is likely to do) then its likely that keywords currently associated with hot news topics will bring those in as well. For example, using my engine to search for the keyword police, brought up peerages (relating to the uk‚Äôs cash for honors scandal recently).

So, according to me:

‚ÄúA semantic search engine is a search engine that takes the sense of a word as a factor in its ranking algorithm or offers the user a choice as to the sense of a word or phrase.‚Ä?

This is not in line with the purists of what is known as ‘The Semantic Web’, who believe that for some reason we should spend all our time tagging documents, pages and images to make them acceptable for a computer to read. Well, I’m sorry but I’m not going to waste my time tagging when a computer is able to derive context and do it for me. I may have offended Tim Berners Lee by saying this, but as the creator of the Web he should know better.

How does Google match up?

Until extremely recently, Google‚Äôs semantic technology (which they‚Äôve had now for quite a while) was limited to matching those adsense blocks to your website‚Äôs content. This is neat, and a good practical example of the technology – but not relevant to their core search product. However if you make a single keyword search today, chances are you may spot a block like this at the bottom of your results page:

This is more or less exactly what I was just writing about. They’re offering you alternatives based upon your initial search, which in this case was obviously for citizen. Citizen is a bank, a watchmaker and (if I’m not mistaken) it means you’re a member of a country or something. This is the first clear example of Google employing a semantic engine that works by analyzing the context of words in their index and returning likely matches for sense.

Some of you may be wondering why they aren’t doing this for multiple keyword phrases, which I can take a guess at from some of my own work. Analyzing the context of a word statistically is intensive and slow; and if you try and analyze two, you slow the process further and so on. It is likely they have problems doing so for more than one keyword currently, and Google as ever is cautious about changing their interface too radically too quickly. This implementation of semantics gives hope that they haven’t adopted the purist view of ‘The Semantic Web’ where everything is tagged and filed neatly into nice little packages.

Google is all too aware of the following very large problems with that idea:

  • Users are stupid.
  • Users are lazy.
  • Redefining the way they‚Äôve indexed what is assumed to be petabytes of data would require them to effectively start again.
  • It‚Äôs not as powerful or dynamic.

How Google can utilize Semantic technologies

It’s my belief that Google will increasingly tie this technology into their core search experience as it improves in speed and reliability. It has some phenomenally powerful uses and I’ve taken the liberty of laying out a few of my suggestions on where they can go with this:

Self aware pages

  • Tagging pages with keywords has always been used on the internet to let search engines know what kind content the page contains.
  • Using a Google API we can generate the necessary keywords on the fly as the page loads. This cuts out a large amount of work for SEO.
  • A Google API enabled engine wouldn‚Äôt even need to look at these keywords, it could generate them itself.
  • Not only a page can be self aware these days, people tag everything – including links. The Google API could conceivably be used to tag every single word on a page, creating a page that covers every single keyword possibility. This is overkill – but a demonstration of the power available.

Narrow Search

  • When you begin a search, you enter just one or two keywords in the topic you‚Äôre interested in.
  • Related keywords appear, which you can then select from to target your search and remove any doubts about dual meanings of a word for example.
  • This step repeats every time you search, also possible is opinionated search.

Opinionated Search

  • Because of the way Google statistically finds the senses of keywords from the mass of pages in its index, what in fact it finds is the majority opinion from those pages of what the sense of a word is.
  • At the base level, you can select from the average opinion of related keywords and subjects from its entire index.
  • You can find the opinion at other levels as well though, and this is where the power comes in in terms of really targeting what the user is looking for quickly and efficiently. All the following mean that this is the first true example of social search:
    • Find the opinion over a range of dates, good for current events, modern history, changes in trends.
    • Find the opinion over areas of geography, or by domain extension (.co.uk, .com).
    • Find the opinion over a certain group of websites, or just one website in particular – compare that with another site.
    • Find the opinion not only over the above things but also subjects, topics, social and religious groups.
    • At the most ridiculous example level, you could even find what topics 18 year olds on myspace living in Leeds most talk about – but that I could probably guess. The point is that this is targeting demographics on a really unprecedented level.
  • Add the sites or web pages to your personal profile that you think most closely reflect your opinions, this data can then be taken into account in all future searches returning greater personal relevancy.


Google is using semantic technology, but is not yet a fully fledged semantic search engine. It does not use NLP (Natural Language Processing), but this is not a barrier to producing some truly web changing technology with a bit of thought and originality. NLP may well be (I hate myself for writing this) web 4.0 and semantics is web 3.0 Рthey are in fact different enough to be classified as such in my eyes and the technology Hakia is developing is certainly markedly distinct from Google’s semantic efforts.

There are barriers that Google needs to overcome… is it capable of becoming fully semantic without modifying it‚Äôs index too drastically; can Google continue to keep the results simple and navigable for its varied user base? Most importantly, does Google intend to become a fully semantic search engine and to do so within a timescale that won‚Äôt damage their position and reputation? I like to think that although the dragon is sleeping, that doesn‚Äôt mean it‚Äôs not dreaming!

Read Full Post »

Deconstructing Real Google Searches: Why Powerset Matters

Written by Guest Author / January 9, 2008 1:07 AM / 13 Comments

This is a guest post by Nitin Karandikar, author of the Software Abstractions blog.

Recently I was looking at the log files for my blog, as I regularly do, and I was suddenly struck by the variety of search queries in Google from which users were being referred to my posts. I write often about the different varieties of search – including vertical search, parametric search, semantic search, and so on – so users with queries about search often land on my blog. But do they always find what they’re looking for?

All the major search engines currently rely on the proximity of keywords and search terms to match results. But that approach can be misleading, causing the search engine to systematically produce incorrect results under certain conditions.

To demonstrate, let us take a look at three general use cases.

[Note: The examples given below are all drawn from Google. To be fair, all the major search engines use similar algorithms, and all suffer from similar problems. For its part, Google handles billions of queries every day, usually very competently. As the reigning market leader, though, Google is the obvious target – it goes with the territory!]

1. Difficulty in Finding Long Tail Results

Take Britney Spears. Given the current popularity of articles, news, pictures, and videos of the superstar singer, the results for practically any query with the word “spears” in it will be loaded with matches about her – especially if the search involves television or entertainment in any way.

Let’s say you’re watching the movie Zulu and you start wondering what material the large spears that all the extras are waving about are made of. So, you go to Google and type in “movie spears material” – this is an obviously insufficient description, as the screen shot below shows.

What happens if you expand on the query further – say: “what are movie spears made out of?” – does it help?

The general issue here is that articles about very popular subjects accumulate high levels of PageRank and then totally overwhelm long tail results. This makes it very difficult for a user to find information about unusual topics that happen to lie near these subjects (at least based on keywords).

2. Keyword Ordering

Since the major search engines focus only on the proximity of keywords without context, a user search that’s similar to a popular concept gets swamped with those results, even if the order of keywords in the query has been reversed. For example, a tragic occurrence that’s common in modern life is that of a bicycle getting hit by a car. Much less common is the possibility of a car getting hit by a bicycle, although it does happen. How would you search for the latter? Try typing “car hit by bicycle” into Google; here’s a screen shot of what you get. [Note the third result, which is actually relevant to this search!]

3. Keyword Relationships

Since the major search engines focus only on the keywords in the search phrase, all sense of the relationship between the search terms is lost. For example, users commonly change the meaning of search terms by using negations and prepositions; it is also fairly common to look for the less common members of a set.

This takes us into the realm of natural language processing (NLP). Without NLP, the nuances of these query modifications are totally invisible to the search algorithms.

For example, a query such as “Famous science fiction writers other than Isaac Asimov” is doomed to failure. A screen shot of this search in Google is presented below. Most of the returned results are about Isaac Asimov, even when the user is explicitly trying to exclude him from the list of authors found.

All of the searches shown above look like gimmicks – queries designed intentionally to mislead Google’s search algorithms. And in a sense, they are; these specific queries can be easily fixed by tweaking the search engine. Nevertheless, they do point to a real need: the value of understanding the meaning behind both the query and the content indexed.

Semantic Search

That’s where the concept of semantic search comes in. I attended a media event earlier this year at stealth search startup Powerset (see: Powerset is Not a Google-killer!), at which they showcased a live demo of their search engine, currently in closed alpha, that highlighted solutions to exactly this type of issue.

For example, type “What was said about Jesus” into a major search engine, and you usually get a whole list of results that consist of the teachings of Jesus; this means that the search engine entirely missed the concepts of passive voice and “about.” The Powerset results, on the other hand, were consistently on target (for the demo, anyway!).

In other words, when you look at just the keywords in the query, you don’t really understand what the user is looking for; by looking at them within context, by taking into account the qualifiers, the prepositions, the negatives, and other such nuances, you can create a semantic graph of the query. The same case can be made for semantic parsing of the content indexed. Put the two together, as Powerset does, and you can get a much better feel for relevance of results.

What about Google? I’m sure the smart folks in Google’s search-quality team are busily working on this problem as well. I look forward to the time when the major search engines handle long tail queries more accurately and make search a better experience for all of us.

Update: for an expanded version of this article with real-life user queries, see my blog.

Read Full Post »

Older Posts »

%d bloggers like this: