Posts Tagged ‘Collective’

10 Semantic Apps to Watch

Written by Richard MacManus / November 29, 2007 12:30 AM / 39 Comments

One of the highlights of October’s Web 2.0 Summit in San Francisco was the emergence of ‘Semantic Apps’ as a force. Note that we’re not necessarily talking about the Semantic Web, which is the Tim Berners-Lee W3C led initiative that touts technologies like RDF, OWL and other standards for metadata. Semantic Apps may use those technologies, but not necessarily. This was a point made by the founder of one of the Semantic Apps listed below, Danny Hillis of Freebase (who is as much a tech legend as Berners-Lee).

The purpose of this post is to highlight 10 Semantic Apps. We’re not touting this as a ‘Top 10’, because there is no way to rank these apps at this point – many are still non-public apps, e.g. in private beta. It reflects the nascent status of this sector, even though people like Hillis and Spivack have been working on their apps for years now.

What is a Semantic App?

Firstly let’s define “Semantic App”. A key element is that the apps below all try to determine the meaning of text and other data, and then create connections for users. Another of the founders mentioned below, Nova Spivack of Twine, noted at the Summit that data portability and connectibility are keys to these new semantic apps – i.e. using the Web as platform.

In September Alex Iskold wrote a great primer on this topic, called Top-Down: A New Approach to the Semantic Web. In that post, Alex Iskold explained that there are two main approaches to Semantic Apps:

1) Bottom Up – involves embedding semantical annotations (meta-data) right into the data.
2) Top down – relies on analyzing existing information; the ultimate top-down solution would be a fully blown natural language processor, which is able to understand text like people do.

Now that we know what Semantic Apps are, let’s take a look at some of the current leading (or promising) products…


Freebase aims to “open up the silos of data and the connections between them”, according to founder Danny Hillis at the Web 2.0 Summit. Freebase is a database that has all kinds of data in it and an API. Because it’s an open database, anyone can enter new data in Freebase. An example page in the Freebase db looks pretty similar to a Wikipedia page. When you enter new data, the app can make suggestions about content. The topics in Freebase are organized by type, and you can connect pages with links, semantic tagging. So in summary, Freebase is all about shared data and what you can do with it.


Powerset (see our coverage here and here) is a natural language search engine. The system relies on semantic technologies that have only become available in the last few years. It can make “semantic connections”, which helps make the semantic database. The idea is that meaning and knowledge gets extracted automatically from Powerset. The product isn’t yet public, but it has been riding a wave of publicity over 2007.


Twine claims to be the first mainstream Semantic Web app, although it is still in private beta. See our in-depth review. Twine automatically learns about you and your interests as you populate it with content – a “Semantic Graph”. When you put in new data, Twine picks out and tags certain content with semantic tags – e.g. the name of a person. An important point is that Twine creates new semantic and rich data. But it’s not all user-generated. They’ve also done machine learning against Wikipedia to ‘learn’ about new concepts. And they will eventually tie into services like Freebase. At the Web 2.0 Summit, founder Nova Spivack compared Twine to Google, saying it is a “bottom-up, user generated crawl of the Web”.


AdaptiveBlue are makers of the Firefox plugin, BlueOrganizer. They also recently launched a new version of their SmartLinks product, which allows web site publishers to add semantically charged links to their site. SmartLinks are browser ‘in-page overlays’ (similar to popups) that add additional contextual information to certain types of links, including links to books, movies, music, stocks, and wine. AdaptiveBlue supports a large list of top web sites, automatically recognizing and augmenting links to those properties.

SmartLinks works by understanding specific types of information (in this case links) and wrapping them with additional data. SmartLinks takes unstructured information and turns it into structured information by understanding a basic item on the web and adding semantics to it.

[Disclosure: AdaptiveBlue founder and CEO Alex Iskold is a regular RWW writer]


Hakia is one of the more promising Alt Search Engines around, with a focus on natural language processing methods to try and deliver ‘meaningful’ search results. Hakia attempts to analyze the concept of a search query, in particular by doing sentence analysis. Most other major search engines, including Google, analyze keywords. The company told us in a March interview that the future of search engines will go beyond keyword analysis – search engines will talk back to you and in effect become your search assistant. One point worth noting here is that, currently, Hakia has limited post-editing/human interaction for the editing of hakia Galleries, but the rest of the engine is 100% computer powered.

Hakia has two main technologies:

1) QDEX Infrastructure (which stands for Query Detection and Extraction) – this does the heavy lifting of analyzing search queries at a sentence level.

2) SemanticRank Algorithm – this is essentially the science they use, made up of ontological semantics that relate concepts to each other.


Talis is a 40-year old UK software company which has created a semantic web application platform. They are a bit different from the other 9 companies profiled here, as Talis has released a platform and not a single product. The Talis platform is kind of a mix between Web 2.0 and the Semantic Web, in that it enables developers to create apps that allow for sharing, remixing and re-using data. Talis believes that Open Data is a crucial component of the Web, yet there is also a need to license data in order to ensure its openness. Talis has developed its own content license, called the Talis Community License, and recently they funded some legal work around the Open Data Commons License.

According to Dr Paul Miller, Technology Evangelist at Talis, the company’s platform emphasizes “the importance of context, role, intention and attention in meaningfully tracking behaviour across the web.” To find out more about Talis, check out their regular podcasts – the most recent one features Kaila Colbin (an occassional AltSearchEngines correspondent) and Branton Kenton-Dau of VortexDNA.

UPDATE: Marshall Kirkpatrick published an interview with Dr Miller the day after this post. Check it out here.


Venture funded UK semantic search engine TrueKnowledge unveiled a demo of its private beta earlier this month. It reminded Marshall Kirkpatrick of the still-unlaunched Powerset, but it’s also reminiscent of the very real Ask.com “smart answers”. TrueKnowledge combines natural language analysis, an internal knowledge base and external databases to offer immediate answers to various questions. Instead of just pointing you to web pages where the search engine believes it can find your answer, it will offer you an explicit answer and explain the reasoning patch by which that answer was arrived at. There’s also an interesting looking API at the center of the product. “Direct answers to humans and machine questions” is the company’s tagline.

Founder William Tunstall-Pedoe said he’s been working on the software for the past 10 years, really putting time into it since coming into initial funding in early 2005.


Tripit is an app that manages your travel planning. Emre Sokullu reviewed it when it presented at TechCrunch40 in September. With TripIt, you forward incoming bookings to plans@tripit.com and the system manages the rest. Their patent pending “itinerator” technology is a baby step in the semantic web – it extracts useful infomation from these mails and makes a well structured and organized presentation of your travel plan. It pulls out information from Wikipedia for the places that you visit. It uses microformats – the iCal format, which is well integrated into GCalendar and other calendar software.

The company claimed at TC40 that “instead of dealing with 20 pages of planning, you just print out 3 pages and everything is done for you”. Their future plans include a recommendation engine which will tell you where to go and who to meet.

Clear Forest

ClearForest is one of the companies in the top-down camp. We profiled the product in December ’06 and at that point ClearForest was applying its core natural language processing technology to facilitate next generation semantic applications. In April 2007 the company was acquired by Reuters. The company has both a Web Service and a Firefox extension that leverages an API to deliver the end-user application.

The Firefox extension is called Gnosis and it enables you to “identify the people, companies, organizations, geographies and products on the page you are viewing.” With one click from the menu, a webpage you view via Gnosis is filled with various types of annotations. For example it recognizes Companies, Countries, Industry Terms, Organizations, People, Products and Technologies. Each word that Gnosis recognizes, gets colored according to the category.

Also, ClearForest’s Semantic Web Service offers a SOAP interface for analyzing text, documents and web pages.


Spock is a people search engine that got a lot of buzz when it launched. Alex Iskold went so far as to call it “one of the best vertical semantic search engines built so far.” According to Alex there are four things that makes their approach special:

  • The person-centric perspective of a query
  • Rich set of attributes that characterize people (geography, birthday, occupation, etc.)
  • Usage of tags as links or relationships between people
  • Self-correcting mechanism via user feedback loop

As a vertical engine, Spock knows important attributes that people have: name, gender, age, occupation and location just to name a few. Perhaps the most interesting aspect of Spock is its usage of tags – all frequent phrases that Spock extracts via its crawler become tags; and also users can add tags. So Spock leverages a combination of automated tags and people power for tagging.


What have we missed? 😉 Please use the comments to list other Semantic Apps you know of. It’s an exciting sector right now, because Semantic Web and Web 2.0 technologies alike are being used to create new semantic applications. One gets the feeling we’re only at the beginning of this trend.

Read Full Post »

Google: “We’re Not Doing a Good Job with Structured Data”

Written by Sarah Perez / February 2, 2009 7:32 AM / 9 Comments

During a talk at the New England Database Day conference at the Massachusetts Institute of Technology, Google’s Alon Halevy admitted that the search giant has “not been doing a good job” presenting the structured data found on the web to its users. By “structured data,” Halevy was referring to the databases of the “deep web” – those internet resources that sit behind forms and site-specific search boxes, unable to be indexed through passive means.

Google’s Deep Web Search

Halevy, who heads the “Deep Web” search initiative at Google, described the “Shallow Web” as containing about 5 million web pages while the “Deep Web” is estimated to be 500 times the size. This hidden web is currently being indexed in part by Google’s automated systems that submit queries to various databases, retrieving the content found for indexing. In addition to that aspect of the Deep Web – dubbed “vertical searching” – Halevy also referenced two other types of Deep Web Search: semantic search and product search.

Google wants to also be able to retrieve the data found in structured tables on the web, said Halevy, citing a table on a page listing the U.S. presidents as an example. There are 14 billion such tables on the web, and, after filtering, about 154 million of them are interesting enough to be worth indexing.

Can Google Dig into the Deep Web?

The question that remains is whether or not Google’s current search engine technology is going to be adept at doing all the different types of Deep Web indexing or if they will need to come up with something new. As of now, Google uses the Big Table database and MapReduce framework for everything search related, notes Alex Esterkin, Chief Architect at Infobright, Inc., a company delivering open source data warehousing solutions. During the talk, Halevy listed a number of analytical database application challenges that Google is currently dealing with: schema auto-complete, synonym discovery, creating entity lists, association between instances and aspects, and data level synonyms discovery. These challenges are addressed by Infobright’s technology, said Esterkin, but “Google will have to solve these problems the hard way.”

Also mentioned during the speech was how Google plans to organize “aspects” of search queries. The company wants to be able to separate exploratory queries (e.g., “Vietnam travel”) from ones where a user is in search of a particular fact (“Vietnam population”). The former query should deliver information about visa requirements, weather and tour packages, etc. In a way, this is like what the search service offered by Kosmix is doing. But Google wants to go further, said Halevy. “Kosmix will give you an ‘aspect,’ but it’s attached to an information source. In our case, all the aspects might be just Web search results, but we’d organize them differently.”

Yahoo Working on Similar Structured Data Retrieval

The challenges facing Google today are also being addressed by their nearest competitor in search, Yahoo. In December, Yahoo announced that they were taking their SearchMonkey technology in-house to automate the extraction of structured information from large classes of web sites. The results of that in-house extraction technique will allow Yahoo to augment their Yahoo Search results with key information returned alongside the URLs.

In this aspect of web search, it’s clear that no single company has yet to dominate. However, even if a non-Google company surges ahead, it may not be enough to get people to switch engines. Today, “Google” has become synonymous with web search, just like “Kleenex” is a tissue, “Band-Aid” is an adhesive bandage, and “Xerox” is a way to make photocopies. Once that psychological mark has been made into our collective psyches and the habit formed, people tend to stick with what they know, regardless of who does it better. That’s something that’s a bit troublesome – if better search technology for indexing the Deep Web comes into existence outside of Google, the world may not end up using it until such point Google either duplicates or acquires the invention.

Still, it’s far too soon to write Google off yet. They clearly have a lead when it comes to search and that came from hard work, incredibly smart people, and innovative technical achievements. No doubt they can figure out this Deep Web thing, too. (We hope).

Read Full Post »

2009 Predictions and Recommendations for Web 2.0 and Social Networks

Christopher Rollyson

Volatility, Uncertainly and Opportunity—Move Crisply while Competitors Are in Disarray

Now that the Year in Review 2008 has summarized key trends, we are in excellent position for 2009 prognostications, so welcome to Part II. As all experienced executives know, risk and reward are inseparable twins, and periods of disruption elevate both, so you will have much more opportunity to produce uncommon value than normal.

This is a high-stakes year in which we can expect surprises. Web 2.0 and social networks can help because they increase flexibility and adaptiveness. Alas, those who succeed will have to challenge conventional thinking considerably, which is not a trivial exercise in normal times. The volatility that many businesses face will make it more difficult because many of their clients and/or employees will be distracted. It will also make it easier because some of them will perceive that extensive change is afoot, and Web 2.0 will blend in with the cacaphony. Disruption produces unusual changes in markets, and the people that perceive the new patterns and react appropriately emerge as new leaders.

2009 Predictions

These are too diverse to be ranked in any particular order. Please share your reactions and contribute those that I have missed.

  1. The global financial crisis will continue to add significant uncertainty in the global economy in 2009 and probably beyond. I have no scientific basis for this, but there are excellent experts of every flavor on the subject, so take your pick. I believe that we are off the map, and anyone who says that he’s sure of a certain outcome should be considered with a healthy skepticism.
    • All I can say is my friends, clients and sources in investment and commercial banking tell me it’s not over yet, and uncertainty is the only certainty until further notice. This has not yet been fully leeched.
    • Western governments, led the the U.S., are probably prolonging the pain because governments usually get bailouts wrong. However, voters don’t have the stomachs for hardship, so we are probably trading short-term “feel good” efforts for a prolonged adjustment period.
  2. Widespread social media success stories in 2009 in the most easily measurable areas such as talent management, business development, R&D and marketing.
    • 2008 saw a significant increase in enterprise executives’ experimentation with LinkedIn, Facebook, YouTube and enterprise (internal) social networks. These will begin to bear fruit in 2009, after which a “mad rush of adoption” will ensue.
    • People who delay adoption will pay dearly in terms of consulting fees, delayed staff training and retarded results.
  3. Internal social networks will largely disappoint. Similar to intranets, they will produce value, but few enterprises are viable long-term without seamlessly engaging the burgeoning external world of experts.
    In general, the larger and more disparate an organization’s audience
    is, the more value it can create, but culture must encourage emergent, cross-boundary connections, which is where many organizations fall down.


  • If you’re a CIO who’s banking heavily on your behind-the-firewall implementation, just be aware that you need to engage externally as well.
  • Do it fast because education takes longer than you think.
  • There are always more smart people outside than inside any organization.
  • Significant consolidation among white label social network vendors, so use your usual customary caution when signing up partners.
    • Due diligence and skill portability will help you to mitigate risks. Any vendor worth their salt will use standardized SOA-friendly architecture and feature sets. As I wrote last year, Web 2.0 is not your father’s software, so focus on people and process more than technology.
    • If your vendor hopeful imposes process on your people, run.
  • No extensive M&A among big branded sites like Facebook, LinkedIn and Twitter although there will probably be some. The concept of the social ecosystem holds that nodes on pervasive networks can add value individually. LinkedIn and Facebook have completely different social contexts. “Traditional” executives tend to view disruptions as “the new thing” that they want to put into a bucket (”let them all buy each other, so I only have to learn one!”). Wrong. This is the new human nervous system, and online social venues, like their offline counterparts, want specificity because they add more value that way. People hack together the networks to which they belong based on their goals and interests.
    • LinkedIn is very focused on the executive environment, and they will not buy Facebook or Twitter. They might buy a smaller company. They are focused on building an executive collaboration platform, and a large acquisition would threaten their focus. LinkedIn is in the initial part of its value curve, they have significant cash, and they’re profitable. Their VCs can smell big money down the road, so they won’t sell this year.
    • Twitter already turned down Facebook, and my conversations with them lead me to believe that they love their company; and its value is largely undiscovered as of yet. They will hold out as long as they can.
    • Facebook has staying power past 2009. They don’t need to buy anyone of import; they are gaining global market share at a fast clip. They already enable customers to build a large part of the Facebook experience, and they have significant room to innovate. Yes, there is a backlash in some quarters against their size. I don’t know Mark Zuckerberg personally, and I don’t have a feeling for his personal goals.
    • I was sad to see that Dow Jones sold out to NewsCorp and, as a long-time Wall Street Journal subscriber, I am even more dismayed now. This will prove a quintessential example of value destruction. The Financial Times currently fields a much better offering. The WSJ is beginning to look like MySpace! As for MySpace itself, I don’t have a firm bead on it but surmise that it has a higher probability of major M&A than the aforementioned: its growth has stalled, Facebook continues to gain, and Facebook uses more Web 2.0 processes, so I believe it will surpass MySpace in terms of global audience.
    • In being completely dominant, Google is the Wal-Mart of Web 2.0, and I don’t have much visibility into their plans, but I think they could make significant waves in 2009. They are very focused on applying search innovation to video, which is still in the initial stages of adoption, so YouTube is not going anywhere.
    • I am less familiar with Digg, Xing, Bebo, Cyworld. Of course, Orkut is part of the Googleverse.
  • Significant social media use by the Obama Administration. It has the knowledge, experience and support base to pursue fairly radical change. Moreover, the degree of change will be in synch with the economy: if there is a significant worsening, expect the government to engage people to do uncommon things.
    • Change.gov is the first phase in which supporters or any interested person is invited to contribute thoughts, stories and documents to the transition team. It aims to keep people engaged and to serve the government on a volunteer basis
    • The old way of doing things was to hand out form letters that you would mail to your representative. Using Web 2.0, people can organize almost instantly, and results are visible in real-time. Since people are increasingly online somewhere, the Administration will invite them from within their favorite venue (MySpace, Facebook…).
    • Obama has learned that volunteering provides people with a sense of meaning and importance. Many volunteers become evangelists.
  • Increasing citizen activism against companies and agencies, a disquieting prospect but one that I would not omit from your scenario planning (ask yourself, “How could people come together and magnify some of our blemishes?” more here). To whit:
    • In 2007, an electronic petition opposing pay-per-use road tolls in the UK reached 1.8 million signatories, stalling a major government initiative. Although this did not primarily employ social media, it is indicative of the phenomenon.
    • In Q4 2008, numerous citizen groups organized Facebook groups (25,000 signatures in a very short time) to oppose television and radio taxes, alarming the Swiss government. Citizens are organizing to stop paying obligatory taxes—and to abolish the agency that administers the tax system. Another citizen initiative recently launched on the Internet collected 60,000 signatures to oppose biometric passports. German links. French links.
    • In the most audacious case, Ahmed Maher is using Facebook to try to topple the government of Egypt. According to Wired’s Cairo Activists Use Facebook to Rattle Regime, activists have organized several large demonstrations and have a Facebook group of 70,000 that’s growing fast.
  • Executive employment will continue to feel pressure, and job searches will get increasingly difficult for many, especially those with “traditional” jobs that depend on Industrial Economy organization.
    • In tandem with this, there will be more opportunities for people who can “free-agent” themselves in some form.
    • In 2009, an increasing portion of executives will have success at using social networks to diminish their business development costs, and their lead will subsequently accelerate the leeching of enterprises’ best and brightest, many of whom could have more flexibility and better pay as independents. This is already manifest as displaced executives choose never to go back.
    • The enterprise will continue to unbundle. I have covered this extensively on the Transourcing website.
  • Enterprise clients will start asking for “strategy” to synchronize social media initiatives. Web 2.0 is following the classic adoption pattern: thus far, most enterprises have been using a skunk works approach to their social media initiatives, or they’ve been paying their agencies to learn while delivering services.
    • In the next phase, beginning in 2009, CMOs, CTOs and CIOs will sponsor enterprise level initiatives, which will kick off executive learning and begin enterprise development of social media native skills. After 1-2 years of this, social media will be spearheaded by VPs and directors.
    • Professional services firms (PwC, KPMG, Deloitte..) will begin scrambling to pull together advisory practices after several of their clients ask for strategy help. These firms’ high costs do not permit them to build significantly ahead of demand.
    • Marketing and ad agencies (Leo Burnett, Digitas…) will also be asked for strategy help, but they will be hampered by their desires to maintain the outsourced model; social media is not marketing, even though it will displace certain types of marketing.
    • Strategy houses (McKinsey, BCG, Booz Allen…) will also be confronted by clients asking for social media strategy; their issue will be that it is difficult to quantify, and the implementation piece is not in their comfort zone, reducing revenue per client.
    • Boutiques will emerge to develop seamless strategy and implementation for social networks. This is needed because Web 2.0 and social networks programs involve strategy, but implementation involves little technology when compared to Web 1.0. As I’ll discuss in an imminent article, it will involve much more interpersonal mentoring and program development.
  • Corporate spending on Enterprise 2.0 will be very conservative, and pureplay and white label vendors (and consultants) will need to have strong business cases.
    • CIOs have better things to spend money on, and they are usually reacting to business unit executives who are still getting their arms around the value of Web 2.0, social networks and social media.
    • Enterprise software vendors will release significant Web 2.0 bolt-on improvements to their platforms in 2009. IBM is arguably out in front with Lotus Connections, with Microsoft Sharepoint fielding a solid solution. SAP and Oracle will field more robust solutions this year.
  • The financial crunch will accelerate social network adoption among those focused on substance rather than flash; this is akin to the dotbomb from 2001-2004, no one wanted to do the Web as an end in itself anymore; it flushed out the fluffy offers (and well as some really good ones).
    • Social media can save money.. how much did it cost the Obama campaign in time and money to raise $500 million? Extremely little.
    • People like to get involved and contribute, when you can frame the activity as important and you provide the tools to facilitate meaningful action. Engagement raises profits and can decrease costs. Engaged customers, for example, tend to leave less often than apathetic customers.
    • Social media is usually about engaging volunteer contributors; if you get it right, you will get a lot of help for little cash outlay.
    • Social media presents many new possibilities for revenue, but to see them, look outside existing product silos. Focus on customer experience by engaging customers, not with your organization, but with each other. Customer-customer communication is excellent for learning about experience.
  • Microblogging will completely mainstream even though Twitter is still quite emergent and few solid business cases exist.
    • Twitter (also Plurk, Jaiku, Pownce {just bought by Six Apart and closed}, Kwippy, Tumblr) are unique for two reasons: they incorporate mobility seamlessly, and they chunk communications small; this leads to a great diversity of “usage context”
    • Note that Dell sold $1 million on Twitter in 2008, using it as a channel for existing business.
    • In many businesses, customers will begin expecting your organization to be on Twitter; this year it will rapidly cease to be a novelty.

    2009 Recommendations

    Web 2.0 will affect business and culture far more than Web 1.0 (the internet), which was about real-time information access and transactions via a standards-based network and interface. Web 2.0 enables real-time knowledge and relationships, so it will profoundly affect most organizations’ stakeholders (clients, customers, regulators, employees, directors, investors, the public…). It will change how all types of buying decisions are made.

    As an individual and/or an organization leader, you have the opportunity to adopt more quickly than your peers and increase your relevance to stakeholders as their Web 2.0 expectations of you increase. 2009 will be a year of significant adoption, and I have kept this list short, general and actionable. I have assumed that your organization has been experimenting with various aspects of Web 2.0, that some people have moderate experience. Please feel free to contact me if you would like more specific or advanced information or suggestions. Recommendations are ranked in importance, the most critical at the top.

    1. What: Audit your organization’s Web 2.0 ecosystem, and conduct your readiness assessment. Why: Do this to act with purpose, mature your efforts past experimentation and increase your returns on investment.
      • The ecosystem audit will tell you what stakeholders are doing, and in what venues. Moreover, a good one will tell you trends, not just numbers. In times of rapid adoption, knowing trends is critical, so you can predict the future. Here’s more about audits.
      • The readiness assessment will help you to understand how your value proposition and resources align with creating and maintaining online relationships. The audit has told you what stakeholders are doing, now you need to assess what you can do to engage them on an ongoing basis. Here’s more about readiness assessments.
    2. What: Select a top executive to lead your organization’s adoption of Web 2.0 and social networks. Why: Web 2.0 is changing how people interact, and your organizational competence will be affected considerably, so applying it to your career and business is very important.
      • This CxO should be someone with a track record for innovation and a commitment to leading discontinuous change. Should be philosophically in synch with the idea of emergent organization and cross-boundary collaboration.
      • S/He will coordinate your creation of strategy and programs (part-time). This includes formalizing your Web 2.0 policy, legal and security due diligence.
    3. What: Use an iterative portfolio approach to pursue social media initiatives in several areas of your business, and chunk investments small.
      Why: Both iteration and portfolio approaches help you to manage risk and increase returns.
    • Use the results of the audit and the readiness assessment to help you to select the stakeholders you want to engage.
    • Engage a critical mass of stakeholders about things that inspire or irritate them and that you can help them with.
    • All else equal, pilots should include several types of Web 2.0 venues and modes like blogs, big branded networks (Facebook, MySpace), microblogs (Twitter), video and audio.
    • As a general rule, extensive opportunity exists where you can use social media to cross boundaries, which usually impose high costs and prevent collaboration. One of the most interesting in 2009 will be encouraging alumni, employees and recruits to connect and collaborate according to their specific business interests. This can significantly reduce your organization’s business development, sales and talent acquisition costs. For more insight to this, see Alumni 2.0.
    • Don’t overlook pilots with multiple returns, like profile management programs, which can reduce your talent acquisition and business development costs. Here’s more on profile management.


  • What: Create a Web 2.0 community with numerous roles to enable employees flexibility.
    Why: You want to keep investments small and let the most motivated employees step forward.

    • Roles should include volunteers for pilots, mentors (resident bloggers, video producers and others), community builders (rapidly codify the knowledge you are gathering from pilots), some part-time more formal roles. Perhaps a full-time person to coordinate would make sense. Roles can be progressive and intermittent. Think of this as open source.
    • To stimulate involvement, the program must be meaningful, and it must be structured to minimize conflicts with other responsibilities.
  • What: Avoid the proclivity to treat Web 2.0 as a technology initiative. Why: Web 1.0 (the Internet) involved more of IT than does Web 2.0, and many people are conditioned to think that IT drives innovation; they fall in the tech trap, select tools first and impose process. This is old school and unnecessary because the tools are far more flexible than the last generation software with which many are still familiar.
    • People create the value when they get involved, and technology often gets in the way by making investments in tools that impose process on people and turn them off. Web 2.0 tools impose far less process on people.
    • More important than what brand you invest in is your focus on social network processes and how they add value to existing business processes. If you adopt smartly, you will be able to transfer assets and processes elsewhere while minimizing disruption. More likely is that some brands will disappear (Pownce closed its doors 15 December). When you focus your organization on mastering process and you distribute learning, you will be more flexible with the tools.
    • Focus on process and people, and incent people to gather and share knowledge and help each other. This will increase your flexibility with tools.
  • What: Manage consulting, marketing and technology partners with a portfolio strategy. Why: Maximize flexibility and minimize risk.
    • From the technology point of view, there are three main vendor flavors: enterprise bolt-on (i.e. Lotus Connections), pureplay white label vendors (SmallWorldLabs) and open (Facebook, LinkedIn). As a group, pureplays have the most diversity in terms of business models, and the most uncertainty. Enterprise bolt-ons’ biggest risk is that they lag significantly behind. More comparisons here.
    • Fight the urge to go with one. If you’re serious about getting business value, you need to be in the open cross-boundary networks. If you have a Lotus or Microsoft relationship, compare Connections and Sharepoint with some pureplays to address private social network needs. An excellent way to start could be with Yammer.
    • Be careful when working with consulting- and marketing-oriented partners who are accustomed to an outsourced model. Web 2.0 is not marketing; it is communicating to form relationships and collaborate online. It does have extensive marketing applications; make sure partners have demonstrated processes for mentoring because Web 2.0 will be a core capability for knowledge-based organizations, and you need to build your resident knowledge.
  • Parting Shots

    I hope you find these thoughts useful, and I encourage you to add your insights and reactions as comments. If you have additional questions about how to use Web 2.0, please feel free to contact me. I wish all the best to you in 2009.

    Read Full Post »

    Evolving Trends

    Wikipedia 3.0: The End of Google?

    In Uncategorized on June 26, 2006 at 5:18 am

    Author: Marc Fawzi

    License: Attribution-NonCommercial-ShareAlike 3.0


    Semantic Web Developers:

    Feb 5, ‘07: The following external reference concerns the use of rule-based inference engines and ontologies in implementing the Semantic Web + AI vision (aka Web 3.0):

    1. Description Logic Programs: Combining Logic Programs with Description Logic (note: there are better, simpler ways of achieving the same purpose.)

    Click here for more info and a list of related articles…


    Two years after I published this article it has received over 200,000 hits and we now have several startups attempting to apply Semantic Web technology to Wikipedia and knowledge wikis in general, including Wikipedia founder’s own commercial startup as well as a startup that was recently purchased by Microsoft.

    Recently, after seeing how Wikipedia’s governance is so flawed, I decided to write about a way to decentralize and democratize Wikipedia.

    Versión española


    (Article was last updated at 10:15am EST, July 3, 2006)

    Wikipedia 3.0: The End of Google?


    The Semantic Web (or Web 3.0) promises to “organize the world’s information” in a dramatically more logical way than Google can ever achieve with their current engine design. This is specially true from the point of view of machine comprehension as opposed to human comprehension.The Semantic Web requires the use of a declarative ontological language like OWL to produce domain-specific ontologies that machines can use to reason about information and make new conclusions, not simply match keywords.

    However, the Semantic Web, which is still in a development phase where researchers are trying to define the best and most usable design models, would require the participation of thousands of knowledgeable people over time to produce those domain-specific ontologies necessary for its functioning.

    Machines (or machine-based reasoning, aka AI software or ‘info agents’) would then be able to use those laboriously –but not entirely manually– constructed ontologies to build a view (or formal model) of how the individual terms within the information relate to each other. Those relationships can be thought of as the axioms (assumed starting truths), which together with the rules governing the inference process both enable as well as constrain the interpretation (and well-formed use) of those terms by the info agents to reason new conclusions based on existing information, i.e. to think. In other words, theorems (formal deductive propositions that are provable based on the axioms and the rules of inference) may be generated by the software, thus allowing formal deductive reasoning at the machine level. And given that an ontology, as described here, is a statement of Logic Theory, two or more independent info agents processing the same domain-specific ontology will be able to collaborate and deduce an answer to a query, without being driven by the same software.

    Thus, and as stated, in the Semantic Web individual machine-based agents (or a collaborating group of agents) will be able to understand and use information by translating concepts and deducing new information rather than just matching keywords.

    Once machines can understand and use information, using a standard ontology language, the world will never be the same. It will be possible to have an info agent (or many info agents) among your virtual AI-enhanced workforce each having access to different domain specific comprehension space and all communicating with each other to build a collective consciousness.

    You’ll be able to ask your info agent or agents to find you the nearest restaurant that serves Italian cuisine, even if the restaurant nearest you advertises itself as a Pizza joint as opposed to an Italian restaurant. But that is just a very simple example of the deductive reasoning machines will be able to perform on information they have.

    Far more awesome implications can be seen when you consider that every area of human knowledge will be automatically within the comprehension space of your info agents. That is because each info agent can communicate with other info agents who are specialized in different domains of knowledge to produce a collective consciousness (using the Borg metaphor) that encompasses all human knowledge. The collective “mind” of those agents-as-the-Borg will be the Ultimate Answer Machine, easily displacing Google from this position, which it does not truly fulfill.

    The problem with the Semantic Web, besides that researchers are still debating which design and implementation of the ontology language model (and associated technologies) is the best and most usable, is that it would take thousands or tens of thousands of knowledgeable people many years to boil down human knowledge to domain specific ontologies.

    However, if we were at some point to take the Wikipedia community and give them the right tools and standards to work with (whether existing or to be developed in the future), which would make it possible for reasonably skilled individuals to help reduce human knowledge to domain-specific ontologies, then that time can be shortened to just a few years, and possibly to as little as two years.

    The emergence of a Wikipedia 3.0 (as in Web 3.0, aka Semantic Web) that is built on the Semantic Web model will herald the end of Google as the Ultimate Answer Machine. It will be replaced with “WikiMind” which will not be a mere search engine like Google is but a true Global Brain: a powerful pan-domain inference engine, with a vast set of ontologies (a la Wikipedia 3.0) covering all domains of human knowledge, that can reason and deduce answers instead of just throwing raw information at you using the outdated concept of a search engine.


    After writing the original post I found out that a modified version of the Wikipedia application, known as “Semantic” MediaWiki has already been used to implement ontologies. The name that they’ve chosen is Ontoworld. I think WikiMind would have been a cooler name, but I like ontoworld, too, as in “it descended onto the world,” since that may be seen as a reference to the global mind a Semantic-Web-enabled version of Wikipedia could lead to.

    Google’s search engine technology, which provides almost all of their revenue, could be made obsolete in the near future. That is unless they have access to Ontoworld or some such pan-domain semantic knowledge repository such that they tap into their ontologies and add inference capability to Google search to build formal deductive intelligence into Google.

    But so can Ask.com and MSN and Yahoo…

    I would really love to see more competition in this arena, not to see Google or any one company establish a huge lead over others.

    The question, to rephrase in Churchillian terms, is wether the combination of the Semantic Web and Wikipedia signals the beginning of the end for Google or the end of the beginning. Obviously, with tens of billions of dollars at stake in investors’ money, I would think that it is the latter. No one wants to see Google fail. There’s too much vested interest. However, I do want to see somebody out maneuver them (which can be done in my opinion.)


    Please note that Ontoworld, which currently implements the ontologies, is based on the “Wikipedia” application (also known as MediaWiki), but it is not the same as Wikipedia.org.

    Likewise, I expect Wikipedia.org will use their volunteer workforce to reduce the sum of human knowledge that has been entered into their database to domain-specific ontologies for the Semantic Web (aka Web 3.0) Hence, “Wikipedia 3.0.”

    Response to Readers’ Comments

    The argument I’ve made here is that Wikipedia has the volunteer resources to produce the needed Semantic Web ontologies for the domains of knowledge that it currently covers, while Google does not have those volunteer resources, which will make it reliant on Wikipedia.

    Those ontologies together with all the information on the Web, can be accessed by Google and others but Wikipedia will be in charge of the ontologies for the large set of knowledge domains they currently cover, and that is where I see the power shift.

    Google and other companies do not have the resources in man power (i.e. the thousands of volunteers Wikipedia has) who would help create those ontologies for the large set of knowledge domains that Wikipedia covers. Wikipedia does, and is positioned to do that better and more effectively than anyone else. Its hard to see how Google would be able create the ontologies for all domains of human knowledge (which are continuously growing in size and number) given how much work that would require. Wikipedia can cover more ground faster with their massive, dedicated force of knowledgeable volunteers.

    I believe that the party that will control the creation of the ontologies (i.e. Wikipedia) for the largest number of domains of human knowledge, and not the organization that simply accesses those ontologies (i.e. Google), will have a competitive advantage.

    There are many knowledge domains that Wikipedia does not cover. Google will have the edge there but only if people and organizations that produce the information also produce the ontologies on their own, so that Google can access them from its future Semantic Web engine. My belief is that it would happen but very slowly, and that Wikipedia can have the ontologies done for all the domain of knowledge that it currently covers much faster, and then they would have leverage by the fact that they would be in charge of those ontologies (aka the basic layer for AI enablement.)

    It still remains unclear, of course, whether the combination of Wikipedia and the Semantic Web herald the beginning of the end for Google or the end of the beginning. As I said in the original part of the post, I believe that it is the latter, and the question I pose in the title of this post, in this context, is not more than rhetorical. However, I could be wrong in my judgment and Google could fall behind Wikipedia as the world’s ultimate answer machine.

    After all, Wikipedia makes “us” count. Google doesn’t. Wikipedia derives its power from “us.” Google derives its power from its technology and inflated stock price. Who would you count on to change the world?

    Response to Basic Questions Raised by the Readers

    Reader divotdave asked a few questions, which I thought to be very basic in nature (i.e. important.) I believe more people will be pondering about the same issues, so I’m to including here them with the replies.

    How does it distinguish between good information and bad? How does it determine which parts of the sum of human knowledge to accept and which to reject?

    It wouldn’t have to distinguish between good vs bad information (not to be confused with well-formed vs badly formed) if it was to use a reliable source of information (with associated, reliable ontologies.) That is if the information or knowledge to be sought can be derived from Wikipedia 3.0 then it assumes that the information is reliable.

    However, with respect to connecting the dots when it comes to returning information or deducing answers from the sea of information that lies beyond Wikipedia then your question becomes very relevant. How would it distinguish good information from bad information so that it can produce good knowledge (aka comprehended information, aka new information produced through deductive reasoning based on exiting information.)

    Who, or what as the case may be, will determine what information is irrelevant to me as the inquiring end user?

    That is a good question and one which would have to be answered by the researchers working on AI engines for Web 3.0

    There will be assumptions made as to what you are inquiring about. Just as when I saw your question I had to make assumption about what you really meant to ask me, AI engines would have to make an assumption, pretty much based on the same cognitive process humans use, which is the topic of a separate post, but which has been covered by many AI researchers.

    Is this to say that ultimately some over-arching standard will emerge that all humanity will be forced (by lack of alternative information) to conform to?

    There is no need for one standard, except when it comes to the language the ontologies are written in (e.g OWL, OWL-DL, OWL Full etc.) Semantic Web researchers are trying to determine the best and most usable choice, taking into consideration human and machine performance in constructing –and exclusive in the latter case– interpreting those ontologies.

    Two or more info agents working with the same domain-specific ontology but having different software (different AI engines) can collaborate with each other.

    The only standard required is that of the ontology language and associated production tools.


    On AI and Natural Language Processing

    I believe that the first generation of AI that will be used by Web 3.0 (aka Semantic Web) will be based on relatively simple inference engines that will NOT attempt to perform natural language processing, where current approaches still face too many serious challenges. However, they will still have the formal deductive reasoning capabilities described earlier in this article, and users would interact with these systems through some query language.

    On the Debate about the Nature and Definition of AI

    The embedding of AI into cyberspace will be done at first with relatively simple inference engines (that use algorithms and heuristics) that work collaboratively in P2P fashion and use standardized ontologies. The massively parallel interactions between the hundreds of millions of AI Agents that will run within the millions of P2P AI Engines on users’ PCs will give rise to the very complex behavior that is the future global brain.


    1. Web 3.0 Update
    2. All About Web 3.0 <– list of all Web 3.0 articles on this site
    3. P2P 3.0: The People’s Google
    4. Reality as a Service (RaaS): The Case for GWorld <– 3D Web + Semantic Web + AI
    5. For Great Justice, Take Off Every Digg
    6. Google vs Web 3.0
    7. People-Hosted “P2P” Version of Wikipedia
    8. Beyond Google: The Road to a P2P Economy

    Update on how the Wikipedia 3.0 vision is spreading:

    Update on how Google is co-opting the Wikipedia 3.0 vision:

    Web 3D Fans:

    Here is the original Web 3D + Semantic Web + AI article:

    Web 3D + Semantic Web + AI *

    The above mentioned Web 3D + Semantic Web + AI vision which preceded the Wikipedia 3.0 vision received much less attention because it was not presented in a controversial manner. This fact was noted as the biggest flaw of social bookmarking site digg which was used to promote this article.

    Web 3.0 Developers:

    Feb 5, ‘07: The following external reference concerns the use of rule-based inference engines and ontologies in implementing the Semantic Web + AI vision (aka Web 3.0):

    1. Description Logic Programs: Combining Logic Programs with Description Logic (note: there are better, simpler ways of achieving the same purpose.)

    Jan 7, ‘07: The following Evolving Trends post discusses the current state of semantic search engines and ways to improve the paradigm:

    1. Designing a Better Web 3.0 Search Engine

    June 27, ‘06: Semantic MediaWiki project, enabling the insertion of semantic annotations (or metadata) into the content:

    1. http://semantic-mediawiki.org/wiki/Semantic_MediaWiki (see note on Wikia below)

    Wikipedia’s Founder and Web 3.0


    Read Full Post »

    Evolving Trends

    Google Warming Up to the Wikipedia 3.0 vision?

    In Uncategorized on December 14, 2007 at 8:09 pm

    [source: slashdot.org]

    Google’s “Knol” Reinvents Wikipedia

    Posted by CmdrTaco on Friday December 14, @08:31AM
    from the only-a-matter-of-time dept.


    teslatug writes “Google appears to be reinventing Wikipedia with their new product that they call knol (not yet publicly available). In an attempt to gather human knowledge, Google will accept articles from users who will be credited with the article by name. If they want, they can allow ads to appear alongside the content and they will be getting a share of the profits if that’s the case. Other users will be allowed to rate, edit or comment on the articles. The content does not have to be exclusive to Google but no mention is made on any license for it. Is this a better model for free information gathering?”

    This article Wikipedia 3.0: The End of Google?  which gives you an idea why Google would want its own Wikipedia was on the Google Finance page for at least 3 months when anyone looked up the Google stock symbol, so Google employees, investors and executive must have seen it. 

    Is it a coincidence that Google is building its own Wikipedia now?

    The only problem is a flaw in Google’s thinking. People who author those articles on Wikipedia actually have brains. People with brains tend to have principles. Getting paid pennies to build the Google empire is rarely one of those principles.


    Read Full Post »

    Tech Biz  :  IT   

    Murdoch Calls Google, Yahoo Copyright Thieves — Is He Right?

    By David Kravets EmailApril 03, 2009 | 5:00:18 PMCategories: Intellectual Property  

    Murdoch_2 Rupert Murdoch, the owner of News Corp. and The Wall Street Journal, says Google and Yahoo are giant copyright scofflaws that steal the news.

    “The question is, should we be allowing Google to steal all our copyright … not steal, but take,” Murdoch says. “Not just them, but Yahoo.”

    But whether search-engine news aggregation is theft or a protected fair use under copyright law is unclear, even as Google and Yahoo profit tremendously from linking to news. So maybe Murdoch is right.

    Murdoch made his comments late Thursday during an address at the Cable Show, an industry event held in Washington. He seemingly was blaming the web, and search engines, for the news media’s ills.

    “People reading news for free on the web, that’s got to change,” he said.

    Real estate magnate Sam Zell made similar comments in 2007 when he took over the Tribune Company and ran it into bankruptcy.

    We suspect Zell and Murdoch are just blowing smoke. If they were not, perhaps they could demand Google and Yahoo remove their news content. The search engines would kindly oblige.

    Better yet, if Murdoch and Zell are so set on monetizing their web content, they should sue the search engines and claim copyright violations in a bid to get the engines to pay for the content.

    The outcome of such a lawsuit is far from clear.

    It’s unsettled whether search engines have a valid fair use claim under the Digital Millennium Copyright Act. The news headlines are copied verbatim, as are some of the snippets that go along.

    Fred von Lohmann of the Electronic Frontier Foundation points out that “There’s not a rock-solid ruling on the question.”

    Should the search engines pay up for the content? Tell us what you think.

    Read Full Post »

    Hakia – First Meaning-based Search Engine

    Written by Alex Iskold / December 7, 2006 12:08 PM / 43 Comments

    Written by Alex Iskold and edited by Richard MacManus. There has been a lot of talk lately about 2007 being the year when we will see companies roll out Semantic Web technologies. The wave started with John Markoff’s article in NY Times and got picked up by Dan Farber of ZDNet and in other media. For background on the Semantic Web in this era, check out our post entitled The Road to the Semantic Web. Also for a lengthy, but very insightful, primer on Semantic Web see Nova Spivak’s recent article.

    The media attention is not accidental. Because Semantic Web promises to help solve information overload problems and deliver major productivity gains, there is a huge amount of resources, engineering and creativity that is being thrown at the Semantic Web. 

    What is also interesting is that there are different problems that need to be solved, in order for things to fall into place. There needs to be a way to turn data into metadata, either at time of creation or via natural language processing. Then there needs to be a set of intelligence, particularly inside the browser, to take advantage of the generated metadata. There are many other interesting nuances and sub-problems that need to be solved, so the Semantic Web marketplace is going to have a rich variety of companies going after different pieces of the puzzle. We are planning to cover some of these companies working in the Semantic Web space, so watch out for more coverage here on Read/WriteWeb.

    Hakia: how is it different from Google?

    The first company we’ll cover is Hakia, which is a “meaning-based” search engine startup getting a bit of buzz. It is a venture-backed, multi-national team company headquartered in New York – and curiously has former US senator Bill Bradley as a board member. It launched its beta in early November this year, but already ranks around 33K on Alexa – which is impressive. They are scheduled to go live in 2007.

    The user interface is similar to Google, but the engine prompts you to enter not just keywords – but a question, a phrase, or a sentence. My first question was: What is the population of China?

    As you can see the results were spot on. I ran the same query on Google and got very similar results, but sans flag. Looking carefully over the results in Hakia, I noticed the message:

    “Your query produced the Hakia gallery for China. What else do you want to know about China?”

    At first this seems like a value add. However, after some thinking about it – I am not sure. What seems to have happened is that instead of performing the search, Hakia classified my question and pulled the results out of a particular cluster – i.e. China. To verify this hypothesis, I ran another query: What is the capital of china?. The results again suggested a gallery for China, but did not produce the right answer. Now to Hakia’s credit, it recovered nicely when I typed in:

    Hakia experiments

    Next I decided to try out some of the examples that the Hakia team suggests on its homepage, along with some of my own. The first one was Why did the chicken cross the road?, which is a Hakia example. The answers were fine, focusing on the ironic nature of the question. Particularly funny was Hakia’s pick:

    My next query was more pragmatic: Where is the Apple store in Soho? (another example from Hakia). The answer was perfect. I then performed the same search on Google and got a perfect result there too. 

    Then I searched for Why did Enron collapse?. Again Hakia did well, but not noticeably better than Google. However, I did see one very impressive thing in Hakia. In its results was this statement: Enron’s collapse was not caused by overstated resource reserves, but by another kind of overstatement. This is pretty witty…. but I am still not convinced that it is doing semantic analysis. Here is why: that reply is not constructed out of words because Hakia understands the semantics of the question. Instead, it pulled this sentence out of one of the documents which had a high rank, that matches the Why did Enron collapse? query.

    In my final experiment, Hakia beat Google hands down. I asked Why did Martha Stewart go to jail? – which is not one of Hakia’s homebrewed examples, but it is fairly similar to their Enron example. Hakia produced perfect results for the Martha question:

    Hakia is impressive, but does it really understand meaning?

    I have to say that Hakia leaves me intrigued. Despite the fact that it could not answer What does Hakia mean? and despite the fact that there isn’t sufficient evidence yet that it really understands meaning. 

    It’s intriguing to think about the old idea of being able to type a question into a computer and always getting a meaningful answer (a la the Turing test). But right now I am mainly interested in Hakia’s method for picking the top answer. That seems to be Hakia’s secret sauce at this point, which is unique and works quite well for them. Whatever heuristic they are using, it gives back meaningful results based on analysis of strings – and it is impressive, at least at first.

    Hakia and Google

    Perhaps the more important question is: Will Hakia beat Google? Hakia itself has no answer, but my answer at this point is no. This current version is not exciting enough and the resulting search set is not obviously better. So it’s a long shot that they’ll beat Google in search. I think if Hakia presented one single answer for each query, with the ability to drill down, it might catch more attention. But again, this is a long shot.

    The final question is: Is semantical search fundamentally better than text search?. This is a complex question and requires deep theoretical expertise to answer it definitively. Here are a few hints…. 

    Google’s string algorithm is very powerful – this is an undeniable fact. A narrow focused vertical search engine, that makes a lot of assumptions about the underlying search domain (e.g. Retrevo) does a great job in finding relevant stuff. So the difficulty that Hakia has to overcome is to quickly determine the domain and then to do a great job searching inside the domain. This is an old and difficult problem related to the understanding of natural language and AI. We know it’s hard, but we also know that it is possible. 

    While we are waiting for all the answers, please give Hakia a try and let us know what you think.

    Leave a comment or trackback on ReadWriteWeb and be in to win a $30 Amazon voucher – courtesy of our competition sponsors AdaptiveBlue and their Netflix Queue Widget.

    6 TrackBacks

    Listed below are links to blogs that reference this entry: Hakia – First Meaning-based Search Engine.TrackBack URL for this entry: http://www.readwriteweb.com/cgi-bin/mt/mt-tb.cgi/2895
    2007 is going to be the year of the Semantic Web – and one of the first signs of that is the appearance of Semantic Search Engines that understand the meaning of phrases and can “extract” meaning out of diverse… Read More
    » Hakia Article on Read/Write Web from SortiPreneur
    R/WW has an early review of hakia and its semantic search endeavor. At the end, Alex Iskold answers the fundamental question that’s on everyone’s mind:Will Hakia beat Google? Hakia itself has no answer, but my answer at this point is Read More
    » Hakia from nXplorer SEO & Marketing Blog
    Auf http://www.hakia.com findet man hakia, eine Suchmaschine, welchen neben einzelnen W√∂rtern und Wortphrasen auch komplette Fragen verarbeiten kann. Ich habe sowohl auf deutsch als auch auf englisch einige Fragen gestellt aber keine vern√ºnftigen Antworten … Read More
    » Search 2.0 – What’s Next? from Read/WriteWeb
    Written by Emre Sokullu and edited by Richard MacManus You may feel relatively satisfied with the current search offerings of Google, Yahoo, Ask and MSN. Search today is undoubtedly much better than what it was in the second half of… Read More
    » The Race to Beat Google from Read/WriteWeb
    Written by Alex Iskold and edited by Richard MacManus In an article in the January 1st 2007 issue of NYTimes, reporter Miguel Helft writes about the race in Silicon Valley to beat Google. Certainly the future of search has been… Read More
    » AI: Favored Search 2.0 Solution from Read/WriteWeb
    In the current Read/WriteWeb poll (see below), we’re asking what ‘search 2.0’ concepts you think stand the best chance of beating Google. The results so far are interesting, because Artificial Intelligence is currently top pick – despite having a histo… Read More


    Subscribe to comments for this post OR Subscribe to comments for all Read/WriteWeb posts

    • Good analysis, I wanted to write one but now there’s no need (:Anyway, I fail to see the difference between a ‘semantic’ search engine and a regular search engine. All search engines are ‘semantic’ in a way. If you type something like ‘How do you make a hot-dog’ in Google, it will give you the right answers. It won’t just search for “how”, then “do”, etc. and compile the results. It also has algorithms which know how to decipher the order of words in a sentence and other patterns that makes our writing meaningful.

      So, Hakia should do something really spectacular to beat Google with the semantic approach. It should actually be able to understand complex sentences better than Google, and as such be a search engine for more complex tasks, for example for questions like ‘I need drivers for Geforce 8800, but not the latest version’. Currently, compared to Google, it doesn’t deliver.

      Posted by: franticindustries | December 7, 2006 12:36 PM

    • What’s interesting is that Ask started out by trying to create just this type of search engine years ago. They abandoned that approach in favor of a more traditional Google competitor. So can we interpret from that that Ask learned that people would rather use a traditional search engine, or was there another reason for the switch?This type of semantical search technology seems especially well suited to encyclopedia sites like Wikipedia or Britannica. I.e., being able to type in “What is the capital of China?” at Wikipedia and get not only relevant topic articles about China, but also the specific answer, would be great. I would love to see a semantic search engine built into MediaWiki. But web search engines should, in my opinion, direct you to a variety of relevant sources.

      I don’t think I’d feel comfortable asking “What were the causes of the American Civil War?” and have the search engine only spit back one result answer (or, one viewpoint).

      Posted by: Josh | December 7, 2006 12:58 PM

    • Josh,Excellent points. I really like the Wiki idea.
      In terms of single answer, I think if you are looking for a quick answer – possibly, but otherwise you would defnitely want more results.

      The other thought occurs to me is that we might not necessarily need the new way of inputing the question in as much as we need new ways of getting the answer. So in a way, I view vertical search engines, like Retrevo, as approaching the same problem but from more pragmatic and better angle.


      Posted by: Alex Iskold | December 7, 2006 1:02 PM

    • Greetings from hakia!Thanks for the review and comments. We appreciate feedback:-)

      We are still developing, it will CONTINUE TO IMPROVE as many of the meaning associations will form in time, like connecting the neurons inside the human brain during childhood. hakia is like a TWO-year old child on the cognitive scale. But it grows EXPONENTIALLY — much faster than a human.



      Posted by: melek pulatkonak | December 7, 2006 2:05 PM

    • Melek,Thats great! Please make sure it does not become self-aware. I would hate for it to experience the kind of pain we do 🙂


      Posted by: Alex Iskold | December 7, 2006 2:19 PM

    • Noted:-)Melek

      Posted by: melek pulatkonak | December 7, 2006 2:25 PM

    • Hakia is promising, good to see this early review, but we’ll be able to judge them only after the official debut. Bad comments > /dev/nullPosted by: Emre Sokullu | December 7, 2006 2:55 PM

    • Hakia sounds quite Finnish – hakea means to fetch for instance.Reminds a little of Ms Dewey actually, but not as, errm, Flash. 🙂

      Posted by: Juha | December 7, 2006 3:58 PM

    • So, do they intend to read RDF? That is, the data about the data.I’d like to talk to them as it simple to read Content Labels. They can then provide users with more information about a sites *before* having to enter them… And that is based on Semantic capabilities 😉

      Posted by: Paul Walsh | December 7, 2006 4:31 PM

    • @Juha: yes, Hakia names comes from that Finish word. See About Us section of their site.Posted by: Emre Sokullu | December 7, 2006 5:03 PM

    • Paul,It seems to me that their claim to fame is that they do not need RDF because they mastered NLP (natural language processing).


      Posted by: Alex Iskold | December 7, 2006 5:15 PM

    • That’s a great question you bring up though Paul. Semantic Web is really associated with RDF, thanks largely to Tim Berners-Lee’s relentless promotion of RDF as ‘HTML 2.0’ (to coin a very awkward phrase!). So how many of these new meaning-based search engines coming on the market will utilize RDF?Alex is much more of an expert in these things than me, but still NLP seems to me the harder route to take – given all the difficulties AI has had in the past.

      Posted by: Richard MacManus | December 7, 2006 6:34 PM

    • I think search engines need to focus on the social aspect. Tracking what users search for and allowing them to vote on sites. This allows them to make good decisions – to immediately understand the domain a housewife is referring to when she says soap and when a developer says the same.Posted by: David Mackey | December 7, 2006 7:59 PM

    • Hmmm, doesn’t like “Where can I find a good globe?” much (a recent search that hadn’t worked too well for me on Google or Froogle). First link is good practice guidelines and legislation reform, which appear to use the word “GLOBE” for some reason (I can’t torture it enough to make it an acronym). Granted, the second link was to an eBay auction for a globe. Third was an auction for a Lionel station light “with globe”. The first and third results suggest to me that the meaning of the question hadn’t been understood. Still, we’re talking beta here, and it’s a very difficult problem. It’ll be interesting to see how they progress.Posted by: T.J. Crowder | December 8, 2006 1:06 AM

    • Hello Melek,
      Hakia rocks, its a really good search experience!Cheers.

      Posted by: Abhishek Sharma | December 8, 2006 2:33 AM

    • A semantic search is quite different from a text search like Google, which is not primarily based on context and the relationship between words and resources, but on the occurrence and position of words.If Haika really does semantic searches it could easily distinguish itself from Google by generating new content (e.g.) answers, that combine relevant unique snippets of information to a semantic result/answer to a query, as opposed to just a list of resources like the other search engines do and Haika currently does. In that case you don’t have to visit the resources to get the answer.

      The query “What is the capital of Finland?”, could show Helsinki as an answer and provide related answers regarding history, population, etymology, other capitals etc.

      For this capability Haika should not only be able to do semantic searches, but entity extraction as well, since RDF and XML schema’s are not that widespread at the moment.

      If they can manage to do this, people won’t hesitate to abandon Google, especially because the Google brand is loosing it’s value rapidly because of SEO, spamming and privacy intrusions…

      Posted by: Gert-Jan van Engelen | December 8, 2006 4:04 AM

    • I think Hakia is bluffing if it claims to be ‘semantic’. I find it as semantic as Google :-)I tried questions like
      Why did the US attack Iraq?
      Why did Israel attack Lebanon?

      It gace absolutely unrealted results which confirms that it is as good as as text search. However, when i tried the Q – “Who is Mahatama Gandhi?” – it immediately responded with a remark “See below the Mahatma Gandhi resume by hakia. What else do you want to know about Mahatma Gandhi?”

      My hunch is that Hakia guys have set up a word filter before the search query gets executed on its DB (call it a ‘semantic filter’ if you’s like). If it contains words like ‘Who’ or ‘What’ it is set to return the ‘resumes’ and ‘galariies’ for the rest of the search terms. But that isnt what a semantic is about – the engine still does not ‘understand’ my question – thats just a slightly ‘domain restricted’ search being performed.

      I could as well have a dropdown for domain (who, what etc) before the search box and retrict the search queries myself!

      While Hakia is not bad – i wont give up my Google for it!

      Posted by: Nikhil Kulkarni | December 8, 2006 8:25 AM

    • really? no one but me remembers askjeeves? i’m all about semantic web, but i’m also skeptical of the recycling of web 1.0 into web 2.0. gigaom & techcrunch have already covered a few companies who have tried this, and while i’m sure hakia is great, let’s not pretend they reinvented the wheel. the concept isn’t new.Posted by: geektastik | December 8, 2006 9:08 AM

    • “but already ranks around 33K on Alexa – which is impressive.”Impressive? Give it a break.

      Posted by: michal frackowiak | December 8, 2006 2:05 PM

    • As pointed out in #16, a Semantic Web search is radically different from a regular search. I see no reason to believe that Hakia has anything to do with the “Semantic Web” proper, as the underlying technologies – RDF, OWL, and so forth – simply are not in widespread use.If the people publishing data on the web are not publishing it in a format which is intended for consumption by the Semantic Web – and most people aren’t – then either Hakia has next to nothing to do with the Semantic Web, or they’ve made an earth-shattering breakthrough in Natural Language Processing.

      Posted by: Phillip Rhodes | December 8, 2006 2:07 PM

    • michal,33K rank is impressive given that the service just launched beta.


      Posted by: Alex Iskold | December 8, 2006 2:26 PM

    • It’s my opinion that for a semantic search engine to *really* work properly, it will have to
      a. have demographic – based parsing logic, not just language – based.
      b. know the demographics of the user submitting the query.Posted by: Ernesto | December 8, 2006 2:31 PM

    • Ernesto,Add other factors like the stuff you like, etc. That would be more of a personalized search. I think the way to go is:

      Personalize( Semantic Search ) ==> Really cool stuff.


      Posted by: Alex Iskold | December 8, 2006 2:36 PM

    • Remember that Google’s growth was spread basically by word of mouth not SUV megalith marketing.
      If google an upstart can do it to yahoo it can happen again.Posted by: Shinderpal jandu | December 8, 2006 2:49 PM

    • This concept didn’t work with ask.com, it ain’t gonna work again now. It simply isn’t how people search for information on the web.
      There are many ways to work search engines but I’m quite surprise we keep seeing the same thing over and over again. What we are missing are real innovations, not a second runner up of same clothes with a different name.Posted by: Sal | December 8, 2006 2:55 PM

    • Ask both of them (and Ask.com) this question:
      what is 5 plus 5?enough said.

      Posted by: Dave | December 8, 2006 3:01 PM

    • @Dave – duh. Things like calculating 5 plus 5 is a VERY simple matter of doing word associations with relevant mathematical operators. Something which I’m sure Hakia can achieve shortly.The more interesting phrases here are – as Melek mentioned above – “connections being formed cognitively” and “intelligent as a 2 year old”. Is the engine behind it aware of the data it parses and spits out? What is the level of awareness then – Word associations, lexical analysis, categorization and meaning vs actual causal factors?

      Posted by: Viksit | December 8, 2006 3:53 PM

    • Nice work, going to check out how this handles.Posted by: Tele Man | December 8, 2006 4:25 PM

    • Very interesting, and props to the developers. I know it’s not a new concept (as pointed out earlier, ASK did try to do it), but then again, neither was a GUI when Apple took over… these things take development — do you know how long the concept of the Macintosh was alive at Xerox park before Jobs discovered it and furthered the development into a now-common operating system? Give Hakia (and semantic-search) a change to develop. Recycled ideas usually have merit. That’s why they’re recycled. They just didn’t get developed 100% the first time around.I do, however, see Hakia as far away from success of semantics. To get the semantics perfectly, and accomplish its goal here, it really has to conquer Bloom’s Taxonomy of learning and apply it to each query; especially if it is to return one (or few) valued and cross-compiled results from different sources.

      Currently, it wouldn’t pass a TRUE Turing Test — just mimics the foreign language copied from book to carry on conversation argument proposed by (insert name here, I forget it at the moment…)
      ^Wow… I just referred to like 5 things I learned last quarter in my freshman computer science classes… that felt good. Hope my thoughts make sense. Keep up the work Hakia, I really would be impressed to see success here, I just think it would have to incorporate some AI which is not looking good (from my eyes, anyway).

      Posted by: Augie | December 8, 2006 9:08 PM

    • I think Hakia weighted W5 (Who, What, Where, When and Why) heavily in the search queries. I think Hakia is decent but I am still not too sure the difference in using semantical search or text search (if the text search query is specific enough).Posted by: andy kong | December 8, 2006 9:34 PM

    • While there is some growing interest in semantics and meaning, partly due to work in the semantic web and upstarts like Hakia, the first copy of the first semantic search engine was delivered to the Congressional Research Service in 1988. I know because I was there and I installed it for the research staff there.In your analysis you asked: Does Hakia really understand meaning?. I think the question that has to be answered first is: What does it mean to understand meaning?. Long before you come to the turning test, you have to come to understand what the term “semantics” means and how it is used and understood by those in and outside the domain of software and computational technology practice.

      The answer to the last question you offered: Is semantical search fundamentally better than text search? depends greatly upon what you think semantical means in a search and retrieval context.

      In a word though, the answer is a resounding Yes.

      I think, in its most common and general usage (among peoples) semantics refers to the interpretation of the significance of the relationships and interactions of subjects and objects in a situational context.

      For example, the semantics of the state of affairs in modern day Iraq range over a state of civil war to extreme cases of outside insurgencies intended to deceive and delude. When the semantics are cloudy and unclear, judgments and decisions about what and how to name particular aspects of the state of affairs can also be murky. Thereby interdependent judgments or decisions become delayed or the subject of further debate. Ideally you want to present a situation such that a uniform perception emerges, with semantics (significance) that drives or guides interpretations such that those that are relevant and those with the same validity or authority prevail.

      As the Bush administration has demonstrated, the process, the presentation, the semantics– can become political and highly charged. When questions of significance persist, that is, questions ranging over the signifier and signified in a given situation, uncertainty, lack of clarity and disarray blur and obscure any significance and generally erode confidence and delay action.

      This is not the kind of semantics the Semantic Web and AI technologies proclaim. In their quest to share and exchange information, they want just enough semantics to normalize data labels between systems so that they are able to exchange information and be sure they are referring to the same items in the data exchange. They want to use named references, with authority of course. In fact, they strive to clear and unambiguous semantics –a foreign concept to the Bush administration.

      But semantics has to do with the significance of interpretation. What is significant in our experience of the search and retrieval application. What is of significance in the results of the search engine? Relevance. The benefit of semantic search is greater relevance. For Hakia to be relevant, it has to offer more relevance than Google. A semantic search engine should also offer more– in my opinion.

      A modern language semantic search engine should offer more than relevance. It should offer insight. Rather than fixing semantics to simple categories for easy exchange, a truly semantical search engine should aid and assist one while exploring topics. It should help to relate language to abstract ideas instead of just connecting the keywords, names and nouns.

      Posted by: Ken Ewell | December 8, 2006 11:32 PM

    • No,It is not better than google ,type the ame questions in google and you wll get better answersPosted by: jyotheendra | December 8, 2006 11:37 PM

    • Gee golly, as far ahead of me Ken Ewell is in every sense of technological knowledge and understanding, I have to say… You went way off topic just to make a point about the Bush administration… I get so sick of that.Of course semantic search is better than connecting language parts. People may not think it’s better, but I argue that they only feel that way because they are used to searching with boolean operators and combinations of keywords. Everyone knows WHAT SPECIFICALLY they want to find, but some people have trouble putting their question into acceptable and successful search terms… Imagine never having to phrase a question specially for a search engine: just type what you’re wondering, and have an instand answer.

      Much easier than combining keywords with booleans to try to simplify natural language to “search engine” language!

      PS — No offense to you, Mr Ewell — I really do respect that your technological insights and opinions are worth 10 times my own because of the knowledge gap; I guess I just got really sick of seeing more politically charged comments in non-related areas… I’m just sick of politics all-together right now, I think. Not trying to start a flame-war or anything! 🙂

      Posted by: Auggie | December 9, 2006 1:36 AM

    • Great job done by hakiaI got the perfect answers to my questions in the top 3-5 links and this saved a lot of time.

      I am impressed

      Posted by: priya | December 9, 2006 11:42 AM

    • What about Chacha.com? they actually have guides who help you with your search.Posted by: Tori | December 9, 2006 3:26 PM

    • Unfortunately, Tori, I was unable to ever get a guide connected to use, but I do remember trying that out a few days ago and thinking it was a pretty cool concept… as long as they don’t charge you for it ever! Could you connect to guides?Posted by: Auggie | December 10, 2006 1:33 AM

    • Guides worked for me.Alex.

      Posted by: Alex Iskold | December 10, 2006 6:15 AM

    • Looks like there’s a /very/ long way to go yet. Given that “what is the capital of china” is semantically ambigous, I tried to be helpful:what is the administrative capital of China
      what is the administrative capital of the United States of America
      what is the administrative capital of the USA
      what is the administrative capital of the US

      Unfortunately, Hakia provided irrelevant answers to all four questions. Google got 4/4.

      Given the apparently overwhelming power of Google’s indexing algorithm and the extent of their dataset, a semantic-based search facility such as Hakia may have to seek a qualitatively different area of search in which to make a contribution.

      Posted by: Graham Higgins | December 10, 2006 7:33 AM

    • Ref: # 35Tried the so called ChaCha.com forget about getting any good result, it felt like I was doing a chat!!! Users around the world have limited attention period. Getting best (no precise) results with minimum efforts – that’s the key. Advanced search and Personalized search have been there for long time with no good impact on users.
      Hakia – doing good work, but it’s too early to say something concrete. In addition, I would not like to accept that Google doesn’t have sementic features in their search algorithm. I’m sure they are working on it or looking out for something good (startup kid).

      Posted by: Dhruba Baishya | December 16, 2006 7:24 PM

    • props to geektastik for doing what the author failed to do. Mention askjeeves.Posted by: Bog | December 19, 2006 9:41 AM

    • I mention Ask Jeeves in the second comment. ;)Posted by: Josh | December 23, 2006 5:10 PM

    • This is good example of success of hakia
      why dont people tell their salaries?Posted by: Anonymous | January 3, 2007 2:14 AM

    • The main for Hakia is that Google is not standing still. G has a secret project which I feel must be to do with semantics.BTW – Google does not use any knowledge of semantics for translation. We have from Google.

      El barco attravesta una cerradua – un vuelo de cerraduras – La estacion de ressorte – jogar de puente

      The last is particular annoying. My daughter plays for England and I when I try to search for “Bridge” I am overwhemed with sites on civil engineering.

      I specifically tested these.
      with Hakia

      The locks on the Grand Union Canal
      Spring flowers (primavera) Springs in Gloustershire (mamanthal)
      Bridge tournaments

      The results on the whole were satisfactory – much better than Google. Understand is a difficult word to define. My definition (bueno espagnol) is the difference between Primavera, Ressorte, Mamanthal. In other words can we use our “understanding” in an operational way. My view is that precise definition + a large enough database = Turing. To some extent Hakia appears to do this. It must be the future. The fly in the oitment is what Google is doing.

      Posted by: Ian Parker | January 6, 2007 5:27 AM

    Read Full Post »

    Spock – Vertical Search Done Right

    Written by Alex Iskold / June 26, 2007 6:10 AM / 11 Comments

    There has been quite a lot of buzz lately around a vertical search engine for people, called Spock. While still in private beta, the engine has already impressed users with its rich feature set and social aspects. Yet, there is something that has gone almost unnoticed – Spock is one of the best vertical semantic search engines built so far. There are four things that makes their approach special:

    • The person-centric perspective of a query
    • Rich set of attributes that characterize people (geography, birthday, occupation, etc.)
    • Usage of tags as links or relationships between people
    • Self-correcting mechanism via user feedback loop

    Spock’s focus on people

    The only kind of search result that you get from Spock is a list of people; and it interprets any query as if it is about people. So whether you search for democrats or ruby on rails or new york, the results will be lists of people associated with the query. In that sense, the algorithm is probably a flavor of the page rank or frequency analysis algorithm used by Google – but tailored to people.

    Rich semantics, tags and relationships

    As a vertical engine, Spock knows important attributes that people have. Even in the beta stage, the set is quite rich: name, gender, age, occupation and location just to name a few. Perhaps the most interesting aspect of Spock is its usage of tags. Firstly, all frequent phrases that Spock extracts via its crawler become tags. In addition, users can also add tags. So Spock leverages a combination of automated tags and people power for tagging.

    A special kind of tag in Spock is called ‘relationships’ – and it’s the secret sauce that glues people together. For example, Chelsea is related to Clinton because she is his daughter, but Bush is related to Clinton because he is the successor to the title of President. The key thing here is that relationships are explicit in Spock. These relationships taken together weave a complex web of connections between people that is completely realistic. Spock gives us a glimpse of how semantics emerge out of the simple mechanism of tagging.

    Feedback loops

    The voting aspect of Spock also harnesses the power of automation and people. It is a simple, yet very interesting way to get feedback into the system. Spock is experimenting with letting people vote on the existing “facts” (tags/relationships) and it re-arranges information to reflect the votes. To be fair, the system is not yet tuned to do this correctly all the time – it’s hard to know right from wrong. However, it is clear that a flavor of this approach in the near future will ‘teach’ computers what the right answer is.

    Limitations of Spock’s approach

    The techniques that we’ve discussed are very impressive, but they have limitations. The main problem is that Spock is likely to have much more complete information about celebrities and well known people than about ordinary people. The reason for it is the amount of data. More people are going to be tagging and voting on the president of the United States than on ordinary people. Unless of course, Spock breaks out and becomes so viral that a lot of local communities form – much like on Facebook. While it’s possible, at this point it does not seem to likely. But even if Spock just becomes a search engine that works best for famous people, it is still very useful and powerful.


    Spock is fascinating because of its focus and leverage of semantics. Using tags as relationships and the feedback loop strike me as having great potential to grow a learning system organically, in the matter that learning systems evolve in nature. Most importantly, it is pragmatic and instantly useful.

    Leave a comment or trackback on ReadWriteWeb and be in to win a $30 Amazon voucher – courtesy of our competition sponsors AdaptiveBlue and their Netflix Queue Widget.

    2 TrackBacks

    Listed below are links to blogs that reference this entry: Spock – Vertical Search Done Right.TrackBack URL for this entry: http://www.readwriteweb.com/cgi-bin/mt/mt-tb.cgi/2309
    » Weekly Wrapup, 25-29 June 2007 from Read/WriteWeb
    The Weekly Wrapups have been a feature of Read/WriteWeb since the beginning of January 2005 (when they were called Web 2.0 Weekly Wrapups). Nowadays the Wrapup is designed for those of you who can’t keep up with a daily dose… Read More
    » The Web’s Top Takeover Targets from Read/WriteWeb
    This past year has been a very eventful one in the M&A arena, with many of web 2.0’s biggest names being snapped up. A few stand-outs include the likes of YouTube, Photobucket, Feedburner, Last.fm, and StumbleUpon. Yet, there still remains… Read More


    Subscribe to comments for this post OR Subscribe to comments for all Read/WriteWeb posts

    • Spock will also create a huge database of “ordinary” people, too.They’re aggregating Facebook, MySpace and LinkedIn. They have less known people, too. I was known to the system – there was not much detail, but it included my name, age, country and MySpace-profile.

      If they start to index more resources, like domains (who owns which domains), blogs (there are millions of them…), more social networks or best: the web in general, they’re on the best way to actually become a search engine for _everybody_.

      Also, don’t underestimate the fact that everybody will at least tag himself. That’s our ego! 🙂

      Posted by: Sebastian | June 26, 2007 6:46 AM

    • I agree that there’s huge potential for Spock, and that it is very well done.Potential downside? If Spock does hit I can envision employers and recruiters making extensive use of it to check up on/get background on employees/prospects – which might not be such a good thing for some.

      Posted by: Chris | June 26, 2007 7:26 AM

    • spock is gonna take alot of money to market that domain. the name is terrible. spook is better. spoke is better. you would think they would at least common sense vertical web address like mylocator.com or something. the world does not need another website that you have to explain what it does. vertical done right needs no explanation to location. change the name. I like spoke better.Posted by: steven emery | June 26, 2007 9:11 AM

    • What the fuk is this?!?
      semenatic who? Dont they make antivirus ?
      Why would they want to do search engine.they cant tell me who stole my screwdriver but I know it was claxton before he left that POS.

      Posted by: Mike Hulubinka | June 26, 2007 10:50 AM

    • Pretty interesting technology. One of the default queries behind the log-in is “people killed by handguns.” I think the feedback loop feature is a great quality control mechanism, assuming it’s not terribly prone to abuse; it’s also a lot of fun to play with!I think I still have a couple invitations if anyone is interested in trying it out.

      Posted by: Cortland Coleman | June 26, 2007 8:23 PM

    • I am not excited by spock because its business objective is meaningless. it is a good tool to kill time. however, google is a great tool to save time.Posted by: keanu | June 26, 2007 8:59 PM

    • Well, I would like to make an interesting comment, but when I went to their site it was down for maintenance.A portent?

      Posted by: Alan Marks | June 27, 2007 6:15 AM

    • I had the same experience as Alan but now Spock’s back up it appears that it’s invitation only. As current users are able to invite others, it would be great if some generous person could send me an invitation! jason (at) talktoshiba.comPosted by: Jason | June 28, 2007 2:20 AM

    • hai all spockerPosted by: rmpal | July 3, 2007 5:22 AM

    • If you want free spock invites go to http://www.swapinvites.com/Posted by: Nathan | July 11, 2007 10:55 AM

    • Crawling the web does not always lead to good results…search on spock.com for “Christian” and just wonder about the results…Posted by: wayne | August 14, 2007 3:19 AM

    Read Full Post »

    Top-Down: A New Approach to the Semantic Web

    Written by Alex Iskold / September 20, 2007 4:22 PM / 17 Comments

    Earlier this week we wrote about the classic approach to the semantic web and the difficulties with that approach. While the original vision of the layer on top of the current web, which annotates information in a way that is “understandable” by computers, is compelling; there are technical, scientific and business issues that have been difficult to address.One of the technical difficulties that we outlined was the bottom-up nature of the classic semantic web approach. Specifically, each web site needs to annotate information in RDF, OWL, etc. in order for computers to be able to “understand” it.

    As things stand today, there is little reason for web site owners to do that. The tools that would leverage the annotated information do not exist and there has not been any clearly articulated business and consumer value. Which means that there is no incentive for the sites to invest money into being compatible with the semantic web of the future.

    But there are alternative approaches. We will argue that a more pragmatic, top-down approach to the semantic web not only makes sense, but is already well on the way toward becoming a reality. Many companies have been leveraging existing, unstructured information to build vertical, semantic services. Unlike the original vision, which is rather academic, these emergent solutions are driven by business and market potential.

    In this post, we will look at the solution that we call the top-down approach to the semantic web, because instead of requiring developers to change or augment the web, this approach leverages and builds on top of current web as-is.

    Why Do We Need The Semantic Web?

    The complexity of original vision of the semantic web and lack of clear consumer benefits makes the whole project unrealistic. The simple question: Why do we need computers to understand semantics? remains largely unanswered.

    While some of us think that building AI is cool, the majority of people think that AI is a little bit silly, or perhaps even unsettling. And they are right. AI for the sake of AI does not make any sense. If we are talking about building intelligent machines, and if we need to spend money and energy annotating all the information in the world for them, then there needs to be a very clear benefit.

    Stated the way it is, the semantic web becomes a vision in search of a reason. What if the problem was restated from the consumer point of view? Here is what we are really looking forward to with the semantic web:

    • Spend less time searching
    • Spend less time looking at things that do not matter
    • Spend less time explaining what we want to computers

    A consumer focus and clear benefit for businesses needs to be there in order for the semantic web vision to be embraced by the marketplace.

    What If The Problem Is Not That Hard?

    If all we are trying to do is to help people improve their online experiences, perhaps the full “understanding” of semantics by computers is not even necessary. The best online search tool today is Google, which is an algorithm based, essentially, on statistical frequency analysis and not semantics. Solutions that attempt to improve Google by focusing on generalized semantics have so far not been finding it easy to do so.

    The truth is that the understanding of natural language by computers is a really hard problem. We have the language ingrained in our genes. We learn language as we grow up. We learn things iteratively. We have the chance to clarify things when we do not understand them. None of this is easily replicated with computers.

    But what if it is not even necessary to build the first generation of semantic tools? What if instead of trying to teach computers natural language, we hard-wired into computers the concepts of everyday things like books, music, movies, restaurants, stocks and even people. Would that help us be more productive and find things faster?

    Simple Semantics: Nouns And Verbs

    When we think about a book we think about handful of things – title and author, maybe genre and the year it was published. Typically, though, we could care less about the publisher, edition and number of pages. Similarly, recipes provoke thoughts about cuisine and ingredients, while movies make us think about the plot, director, and stars.

    When we think of people, we also think about a handful of things: birthday, where do they live, how we’re related to them, etc. The profiles found on popular social networks are great examples of simple semantics based around people:

    Books, people, recipes, movies are all examples of nouns. The things that we do on the web around these nouns, such as looking up similar books, finding more people who work for the same company, getting more recipes from the same chef and looking up pictures of movie stars, are similar to verbs in everyday language. These are contextual actuals that are based on the understanding of the noun.

    What if semantic applications hard-wired understanding and recognition of the nouns and then also hard-wired the verbs that make sense? We are actually well on our way doing just that. Vertical search engines like Spock, Retrevo, ZoomInfo, the page annotating technology from Clear Forrest, Dapper, and the Map+ extension for Firefox are just a few examples of top-down semantic web services.

    The Top-Down Semantic Web Service

    The essence of a top-down semantic web service is simple – leverage existing web information, apply specific, vertical semantic knowledge and then redeliver the results via a consumer-centric application. Consider the vertical search engine Spock, which scans the web for information about people. It knows how to recognize names in HTML pages and it also looks for common information about people that all people have – birthdays, locations, marital status, etc. In addition, Spock “understands” that people relate to each other. If you look up Bush, then Clinton will show up as a predecessor. If you look up Steve Jobs, then Bill Gates will come up as a rival.

    In other words, Spock takes simple, everyday semantics about people and applies it to the information that already exists online. The result? A unique and useful vertical search engine for people. Further, note that Spock does not require the information to be re-annotated in RDF and OWL. Instead, the company builds adapters that use heuristics to get the data. The engine does not actually have full understanding of semantics about people, however. For example, it does not know that people like different kinds of ice cream, but it doesn’t need to. The point is that by focusing on a simple semantics, Spock is able to deliver a useful end-user service.

    Another, much simpler, example is the Map+ add-on for Firefox. This application recognizes addresses and provides a map popup using Yahoo! Maps. It is the simplicity of this application that precisely conveys the power of simple semantics. The add-on “knows” what addresses look like. Sure, sometimes it makes mistakes, but most of the time it tags addresses in online documents properly. So it leverages existing information and then provides direct end user utility by meshing it up with Yahoo! Maps.

    The Challenges Facing The Top-Down Approach

    Despite being effective, the somewhat simplistic top-down approach has several problems. First, it is not really the semantic web as it is defined, instead its a group of semantic web services and applications that create utility by leveraging simple semantics. So the proponents of the classic approach would protest and they would be right. Another issue is that these services do not always get semantics right because of ambiguities. Because the recognition is algorithmic and not based on an underlying RDF representation, it is not perfect.

    It seems to me that it is better to have simpler solutions that work 90% of the time than complex ones that never arrive. The key questions here are: How exactly are mistakes handled? And, is there a way for the user to correct the problem? The answers will be left up to the individual application. In life we are used to other people being unpredictable, but with computers, at least in theory, we expect things to work the same every time.

    Yet another issue is that these simple solutions may not scale well. If the underlying unstructured data changes can the algorithms be changed quickly enough? This is always an issue with things that sit on top of other things without an API. Of course, if more web sites had APIs, as we have previously suggested, the top-down semantic web would be much easier and more certain.


    While the original vision of the semantic web is grandiose and inspiring in practice it has been difficult to achieve because of the engineering, scientific and business challenges. The lack of specific and simple consumer focus makes it mostly an academic exercise. In the mean time, existing data is being leveraged by applying simple heuristics and making assumptions about particular verticals. What we have dubbed top-down semantic web applications have been appearing online and improving end user experiences by leveraging semantics to deliver real, tangible services.

    Will the bottom-up semantic web ever happen? Possibly. But, at the moment the precise path to get there is not quite clear. In the mean time, we can all enjoy better online experience and get to where we need to go faster thanks to simple top-down semantic web services.

    Leave a comment or trackback on ReadWriteWeb and be in to win a $30 Amazon voucher – courtesy of our competition sponsors AdaptiveBlue and their Netflix Queue Widget.

    5 TrackBacks

    Listed below are links to blogs that reference this entry: Top-Down: A New Approach to the Semantic Web.TrackBack URL for this entry: http://www.readwriteweb.com/cgi-bin/mt/mt-tb.cgi/1638
    Summary: The original vision of the semantic web as a layer on top of the current web, annotated in a way that computers can “understand,” is certainly grandiose and intriguing. Yet, for the past decade it has been a kind… Read More
    Alex Iskold’s ‘Semantic Web: Difficulties with the Classic Approach’ for Read/Write Web was one of the posts rolled up into yesterday’s outpouring here on Nodalities. He’s been busy during the (my) night, and I woke this morning to ‘Top-Down:… Read More
    Yesterday brought an enlightening post by Alex Iskold, entitled “Top-Down: A New Approach to the Semantic Web“: “While the original vision of the semantic web is grandiose and inspiring in practice it has been difficult to achieve bec… Read More
    Here is a summary of the week’s Web Tech action on Read/WriteWeb. Note that you can subscribe to the Weekly Wrapups, either via the special RSS feed or by email. Web News Yahoo! Drops $350m on Zimbra; an Open Source,… Read More
    Em teoria a web sem√¢ntica √© fant√°stica, ou seja, redescrever toda a informa√ß√£o que j√° existe na web na tentativa de fazer os computadores entenderem o significado das coisas. Em poucas palavras, seria uma camada a mais na web com meta-informa√ß√µ… Read More


    Subscribe to comments for this post OR Subscribe to comments for all Read/WriteWeb posts

    • Hi Alex. The top-down approach alone is not enough to reach the Semantic Web. It’s not even enough to reach the half-hearted attempt at the Semantic Web that you describe. I believe both the bottom-up and top-down approaches will be needed to reach the goal. At this time we’re faced with few too many people attempting either approach. Top-down isn’t even fully feasible yet, whereas a bottom-up approach can at least be done with currently available technology.I fully disagree with your statement that the complexity of original vision of the semantic web and lack of clear consumer benefits makes the whole project unrealistic.

      Posted by: James | September 20, 2007 4:58 PM

    • I think the use of Microformats does provide some actual practical usage of a form of machine-readable semantic formatting for content. Ok, it’s maybe not quite “the semantic web” people envision but it does have some usefulness.I also read an blog post yesterday by Peter Krantz entitled “RDFa – Implications for Accessibility” which talks about the W3C’s RDFa HTML extensions as opposed to Microformats as a means to include machine readable data.

      I wasn’t sure if I should be writing ‘semantic web’ with a capital S as some people seem to use that as suggesting something more than just the concept of ‘semantic standards compliant HTML / XHTML’!

      Posted by: Rick Curran | September 20, 2007 5:10 PM

    • If you’re talking about the Semantic Web (the W3C attempt, and the actual Semantic Web) then you use caps for the S and the W. If you’re talking about types of webs (reactive web, proactive web, semantic web) you would not use caps (proper noun vs noun).Posted by: James | September 20, 2007 5:29 PM

    • My bullshit meter went off after the first paragraph. You are so far off dude. You should take some time to “understand” it before you try to write about it. Otherwise you are just making noise.Your article is just noise.

      Where do I check that this article is not useful?

      Posted by: Ken Ewell | September 20, 2007 6:07 PM

    • Good article, especially about the top down vs. bottom up. I am working on a very specific problem – make it easy for teachers to create lessons – and search is not an answer ! We are working on an overlay – a top down semantic web, which not only includes normal metadata but also more domain specific, contextual information. SOme thoughts at http://doubleclix.wordpress.com/Posted by: Krishna Sankar | September 20, 2007 6:08 PM

    • The bottom-up and top-down approaches are not mutually exclusive, so there is no point in trying to pit one against the other. And indeed, why wait until the perfect vision is implemented? If some value can be delivered now by cutting corners, and more value later by investing in a more formal approach in parallel, then surely everyone wins. If a top-down service is able to cheaply extract facts from the Web now, then surely, it should be able to easily translate these facts into predicates (and map them to ontologies) so as to plug into bottom-up machinery (rules, proofs, etc.) as it becomes available. It’s all good.Posted by: Jean-Michel Decombe | September 20, 2007 6:21 PM

    • I must admit I was very disappointed to see an article on this topic with such a wide audience not take the opportunity to increase the visibility of microformats.In the perfect world, publishing platforms (wordpress, cms’s in general,etc.) would allow the publisher to easily mark certain parts of their content with semantic value, using microformats.

      Then, all modern browsers should be able to recognize them and provide the users with some useful actions. Add hCards to your address book, events to your calendar of choice, etc. The list goes on and on.

      From where I stand, we’re not that far. Firefox 3, MS IE8 and Apple have all shown interest in this matter. Let’s all hold hands and see what they have in store for us.

      Sir Tim Berners-Lee is much more than a dreamer. He is, as we all know, a visionary. Thank you, Sir.

      I hate spamming, but if you’re interested in these matters visit microformats dot org for more info and/or click my name for a fresh screencast showing how this works for the users.


      Posted by: André Luís | September 20, 2007 6:24 PM

    • How many stacked straw men does it take to reach the Moon? Apparently, both not too many, and quite a few.This week’s dust up about “what is the semantic Web?” is but a mote in the eye of history, and even within very recent history (say, 1-2 years) at that.

      The real story behind everyone desiring to state the obvious about easy things and hard things relating to information federation is that commercial prospects must be near at hand. I take this as good news.

      It will be interesting to see whose silks get dirtied as this jockeying continues out of the gate.

      Posted by: Mike Bergman | September 20, 2007 7:17 PM

    • I get the same sense of things Mike. The heated debating back and forth shows not only that the Semantic Web is taking in larger numbers of followers, but that we’re nearing a time when we can put what we’ve researched to practical use.The interesting thing to me will be the kind of products and services that will emerge. To me it’s not entirely clear yet what markets will be most profitable for Semantic Web technology (and semantic technologies in general).

      I hear that there is a lot of money in the market for a system that radically simplifies data exchange in the enterprise, but consumer products and services I’m not sure about. I don’t think there will be a market for “Semantic Web browsers.” I’m sure Firefox 4 will accommodate any such needs, and I would hope that becomes the case.

      I need to do my research on what the current “Semantic Web companies” are up to.

      Posted by: James | September 20, 2007 8:53 PM

    • I lost my comment on your last post somewhere, so I blogged it.All I’d add here is that most of the systems you describe are effectively domain-specific data silos. Unless there’s Web-based interop, these things are merely on the Web, not of the Web. Semantic Web technologies are designed for truly Web-based data integration, they are essentially an evolution of the link.

      Mike’s comment above creased me up – especially since you only have to look at his collection of Sweet Tools to see the “bottom-up Semantic Web” is coming along just fine, thank you 🙂 He does have a point – all this is really about is moving from a Web of Documents to a more general Web of Data.

      For a continual update, subscribe to Planet RDF, or even This Week’s Semantic Web. Coders might also be interested in the Developers Guide to Semantic Web Toolkits
      for different Programming Languages
      . As well as tools applications, there’s more and more linked data appearing on the Web all the time…

      Posted by: Danny | September 21, 2007 3:38 AM

    • some good points though I think you might use some less words. e.g. the goalsSpend less time searching
      Spend less time looking at things that do not matter
      Spend less time explaining what we want to computers

      are in short: “spend less time on things you do not like”.

      still, excellent post. as always 😉

      Posted by: Peter P | September 21, 2007 5:02 AM

    • Alex,
      You make good points. We need both top-down and bottom-up approaches.Isn’t GRDDL (http://www.w3.org/2004/01/rdxh/spec) a generic approach to gather information from documents?

      Simile projects and RDFizers are worth a look (http://simile.mit.edu/wiki/RDFizers)

      I think semantic web components – a way to describe the components that make up web applications, may be another approach to build bottom-up web.

      We do need a general framework of resource description as a common vocabulary whether our approach is top-down or bottom-up.

      We do need more dialog and I am glad that you started it with this post.


      Posted by: Dorai Thodla | September 21, 2007 5:23 AM

    • Peter’s P’s last comment pretty much sums up what everyone wants from new web technology (regardless of whether or not it falls under the semantic web umbrella), doesn’t it?And I quote:

      “Spend less time searching
      Spend less time looking at things that do not matter
      Spend less time explaining what we want to computers”

      In general, people want to spend less time doing the boring stuff and get right to good/relevant/interesting stuff (if I could add a picture, I’d totally post the “This is relevant to my interests” lolcat right, because really, what topic couldn’t benefit from a little lolcat levity?

      *by the way, since I know that many RWW readers are of the entrepreneurial type, if anyone is working on a project or has an idea that accomplishes the above missions, check out the Knight News Challenge – http://newschallenge.org.

      Posted by: Jackie | September 21, 2007 11:04 AM

    • I have just retired after finished spending years trying to play a small part in controlling a corporate intranet with rules as basic as “use HTML”. It degenerated into a collection of thousands of PDFs (with internal links)and even Word documents posted straight to the Intranet. In spite of supplied templates and document management tools the information suppliers saw the Intranet as if it were a paper filing cabinet. HTML combined with proper use of CSS goes a long way towards basic structure but even when given the tools information suppliers will not see the reason to use them.Posted by: Albert Mispel | September 21, 2007 1:23 PM

    • Good effort, but there is very little new here. Lots of work has been done in the area of semantic integration which understands that an inference architecture will always result in false associations that typically require lots of manual refinements (customizations) of ontologies.Semantic integration (where mission critical systems are involved) is a case of the good being the enemy of the perfect. If only we could return lots of choices and let the user pick. Google has it easy.

      Posted by: Pano | September 21, 2007 6:57 PM

    • yea google… they will get this …Posted by: Nature Wallpaper | September 21, 2007 11:34 PM

    • Thanks a lot for this post and the previous one on semantic web. Really interesting. I was wondering whether you will address what you said about computers not being able to understand human language, later on. I think this is one of the fundamental problems with semantic web. Although I do agree with you that we should do what we can, even if that means we cannot get any further than the “simple semantic web”. More comments on my blog.Posted by: Samuel Driessen | December 18, 2007 12:36 PM

    Read Full Post »


    Tech Biz  :  IT   RSS

    Free! Why $0.00 Is the Future of Business

    By Chris Anderson Email 02.25.08 | 12:00 AM

    At the age of 40, King Gillette was a frustrated inventor, a bitter anticapitalist, and a salesman of cork-lined bottle caps. It was 1895, and despite ideas, energy, and wealthy parents, he had little to show for his work. He blamed the evils of market competition. Indeed, the previous year he had published a book, The Human Drift, which argued that all industry should be taken over by a single corporation owned by the public and that millions of Americans should live in a giant city called Metropolis powered by Niagara Falls. His boss at the bottle cap company, meanwhile, had just one piece of advice: Invent something people use and throw away.One day, while he was shaving with a straight razor that was so worn it could no longer be sharpened, the idea came to him. What if the blade could be made of a thin metal strip? Rather than spending time maintaining the blades, men could simply discard them when they became dull. A few years of metallurgy experimentation later, the disposable-blade safety razor was born. But it didn’t take off immediately. In its first year, 1903, Gillette sold a total of 51 razors and 168 blades. Over the next two decades, he tried every marketing gimmick he could think of. He put his own face on the package, making him both legendary and, some people believed, fictional. He sold millions of razors to the Army at a steep discount, hoping the habits soldiers developed at war would carry over to peacetime. He sold razors in bulk to banks so they could give them away with new deposits (“shave and save” campaigns). Razors were bundled with everything from Wrigley’s gum to packets of coffee, tea, spices, and marshmallows. The freebies helped to sell those products, but the tactic helped Gillette even more. By giving away the razors, which were useless by themselves, he was creating demand for disposable blades. A few billion blades later, this business model is now the foundation of entire industries: Give away the cell phone, sell the monthly plan; make the videogame console cheap and sell expensive games; install fancy coffeemakers in offices at no charge so you can sell managers expensive coffee sachets.

    Chris Anderson discusses “Free.”

    Video produced by Annaliza Savage and edited by Michael Lennon.

    Thanks to Gillette, the idea that you can make money by giving something away is no longer radical. But until recently, practically everything “free” was really just the result of what economists would call a cross-subsidy: You’d get one thing free if you bought another, or you’d get a product free only if you paid for a service.

    Over the past decade, however, a different sort of free has emerged. The new model is based not on cross-subsidies — the shifting of costs from one product to another — but on the fact that the cost of products themselves is falling fast. It’s as if the price of steel had dropped so close to zero that King Gillette could give away both razor and blade, and make his money on something else entirely. (Shaving cream?)

    You know this freaky land of free as the Web. A decade and a half into the great online experiment, the last debates over free versus pay online are ending. In 2007 The New York Times went free; this year, so will much of The Wall Street Journal. (The remaining fee-based parts, new owner Rupert Murdoch announced, will be “really special … and, sorry to tell you, probably more expensive.” This calls to mind one version of Stewart Brand’s original aphorism from 1984: “Information wants to be free. Information also wants to be expensive … That tension will not go away.”)

    Once a marketing gimmick, free has emerged as a full-fledged economy. Offering free music proved successful for Radiohead, Trent Reznor of Nine Inch Nails, and a swarm of other bands on MySpace that grasped the audience-building merits of zero. The fastest-growing parts of the gaming industry are ad-supported casual games online and free-to-try massively multiplayer online games. Virtually everything Google does is free to consumers, from Gmail to Picasa to GOOG-411.

    The rise of “freeconomics” is being driven by the underlying technologies that power the Web. Just as Moore’s law dictates that a unit of processing power halves in price every 18 months, the price of bandwidth and storage is dropping even faster. Which is to say, the trend lines that determine the cost of doing business online all point the same way: to zero.

    But tell that to the poor CIO who just shelled out six figures to buy another rack of servers. Technology sure doesn’t feel free when you’re buying it by the gross. Yet if you look at it from the other side of the fat pipe, the economics change. That expensive bank of hard drives (fixed costs) can serve tens of thousands of users (marginal costs). The Web is all about scale, finding ways to attract the most users for centralized resources, spreading those costs over larger and larger audiences as the technology gets more and more capable. It’s not about the cost of the equipment in the racks at the data center; it’s about what that equipment can do. And every year, like some sort of magic clockwork, it does more and more for less and less, bringing the marginal costs of technology in the units that we individuals consume closer to zero.

    Photo Illustration: Jeff Mermelstein

    As much as we complain about how expensive things are getting, we’re surrounded by forces that are making them cheaper. Forty years ago, the principal nutritional problem in America was hunger; now it’s obesity, for which we have the Green Revolution to thank. Forty years ago, charity was dominated by clothing drives for the poor. Now you can get a T-shirt for less than the price of a cup of coffee, thanks to China and global sourcing. So too for toys, gadgets, and commodities of every sort. Even cocaine has pretty much never been cheaper (globalization works in mysterious ways).

    Digital technology benefits from these dynamics and from something else even more powerful: the 20th-century shift from Newtonian to quantum machines. We’re still just beginning to exploit atomic-scale effects in revolutionary new materials — semiconductors (processing power), ferromagnetic compounds (storage), and fiber optics (bandwidth). In the arc of history, all three substances are still new, and we have a lot to learn about them. We are just a few decades into the discovery of a new world.

    What does this mean for the notion of free? Well, just take one example. Last year, Yahoo announced that Yahoo Mail, its free webmail service, would provide unlimited storage. Just in case that wasn’t totally clear, that’s “unlimited” as in “infinite.” So the market price of online storage, at least for email, has now fallen to zero (see “Webmail Windfall“). And the stunning thing is that nobody was surprised; many had assumed infinite free storage was already the case.

    For good reason: It’s now clear that practically everything Web technology touches starts down the path to gratis, at least as far as we consumers are concerned. Storage now joins bandwidth (YouTube: free) and processing power (Google: free) in the race to the bottom. Basic economics tells us that in a competitive market, price falls to the marginal cost. There’s never been a more competitive market than the Internet, and every day the marginal cost of digital information comes closer to nothing.

    One of the old jokes from the late-’90s bubble was that there are only two numbers on the Internet: infinity and zero. The first, at least as it applied to stock market valuations, proved false. But the second is alive and well. The Web has become the land of the free.

    The result is that we now have not one but two trends driving the spread of free business models across the economy. The first is the extension of King Gillette’s cross-subsidy to more and more industries. Technology is giving companies greater flexibility in how broadly they can define their markets, allowing them more freedom to give away products or services to one set of customers while selling to another set. Ryanair, for instance, has disrupted its industry by defining itself more as a full-service travel agency than a seller of airline seats (see “How Can Air Travel Be Free?”).

    The second trend is simply that anything that touches digital networks quickly feels the effect of falling costs. There’s nothing new about technology’s deflationary force, but what is new is the speed at which industries of all sorts are becoming digital businesses and thus able to exploit those economics. When Google turned advertising into a software application, a classic services business formerly based on human economics (things get more expensive each year) switched to software economics (things get cheaper). So, too, for everything from banking to gambling. The moment a company’s primary expenses become things based in silicon, free becomes not just an option but the inevitable destination.

    Forty years ago, Caltech professor Carver Mead identified the corollary to Moore’s law of ever-increasing computing power. Every 18 months, Mead observed, the price of a transistor would halve. And so it did, going from tens of dollars in the 1960s to approximately 0.000001 cent today for each of the transistors in Intel’s latest quad-core. This, Mead realized, meant that we should start to “waste” transistors.

    Waste is a dirty word, and that was especially true in the IT world of the 1970s. An entire generation of computer professionals had been taught that their job was to dole out expensive computer resources sparingly. In the glass-walled facilities of the mainframe era, these systems operators exercised their power by choosing whose programs should be allowed to run on the costly computing machines. Their role was to conserve transistors, and they not only decided what was worthy but also encouraged programmers to make the most economical use of their computer time. As a result, early developers devoted as much code as possible to running their core algorithms efficiently and gave little thought to user interface. This was the era of the command line, and the only conceivable reason someone might have wanted to use a computer at home was to organize recipe files. In fact, the world’s first personal computer, a stylish kitchen appliance offered by Honeywell in 1969, came with integrated counter space.

    Photo Illustration: Jeff Mermelstein

    And here was Mead, telling programmers to embrace waste. They scratched their heads — how do you waste computer power? It took Alan Kay, an engineer working at Xerox’s Palo Alto Research Center, to show them. Rather than conserve transistors for core processing functions, he developed a computer concept — the Dynabook — that would frivolously deploy silicon to do silly things: draw icons, windows, pointers, and even animations on the screen. The purpose of this profligate eye candy? Ease of use for regular folks, including children. Kay’s work on the graphical user interface became the inspiration for the Xerox Alto, and then the Apple Macintosh, which changed the world by opening computing to the rest of us. (We, in turn, found no shortage of things to do with it; tellingly, organizing recipes was not high on the list.)

    Of course, computers were not free then, and they are not free today. But what Mead and Kay understood was that the transistors in them — the atomic units of computation — would become so numerous that on an individual basis, they’d be close enough to costless that they might as well be free. That meant software writers, liberated from worrying about scarce computational resources like memory and CPU cycles, could become more and more ambitious, focusing on higher-order functions such as user interfaces and new markets such as entertainment. And that meant software of broader appeal, which brought in more users, who in turn found even more uses for computers. Thanks to that wasteful throwing of transistors against the wall, the world was changed.

    What’s interesting is that transistors (or storage, or bandwidth) don’t have to be completely free to invoke this effect. At a certain point, they’re cheap enough to be safely disregarded. The Greek philosopher Zeno wrestled with this concept in a slightly different context. In Zeno’s dichotomy paradox, you run toward a wall. As you run, you halve the distance to the wall, then halve it again, and so on. But if you continue to subdivide space forever, how can you ever actually reach the wall? (The answer is that you can’t: Once you’re within a few nanometers, atomic repulsion forces become too strong for you to get any closer.)

    In economics, the parallel is this: If the unitary cost of technology (“per megabyte” or “per megabit per second” or “per thousand floating-point operations per second”) is halving every 18 months, when does it come close enough to zero to say that you’ve arrived and can safely round down to nothing? The answer: almost always sooner than you think.

    What Mead understood is that a psychological switch should flip as things head toward zero. Even though they may never become entirely free, as the price drops there is great advantage to be had in treating them as if they were free. Not too cheap to meter, as Atomic Energy Commission chief Lewis Strauss said in a different context, but too cheap to matter. Indeed, the history of technological innovation has been marked by people spotting such price and performance trends and getting ahead of them.

    From the consumer’s perspective, though, there is a huge difference between cheap and free. Give a product away and it can go viral. Charge a single cent for it and you’re in an entirely different business, one of clawing and scratching for every customer. The psychology of “free” is powerful indeed, as any marketer will tell you.

    This difference between cheap and free is what venture capitalist Josh Kopelman calls the “penny gap.” People think demand is elastic and that volume falls in a straight line as price rises, but the truth is that zero is one market and any other price is another. In many cases, that’s the difference between a great market and none at all.

    The huge psychological gap between “almost zero” and “zero” is why micropayments failed. It’s why Google doesn’t show up on your credit card. It’s why modern Web companies don’t charge their users anything. And it’s why Yahoo gives away disk drive space. The question of infinite storage was not if but when. The winners made their stuff free first.

    Traditionalists wring their hands about the “vaporization of value” and “demonetization” of entire industries. The success of craigslist’s free listings, for instance, has hurt the newspaper classified ad business. But that lost newspaper revenue is certainly not ending up in the craigslist coffers. In 2006, the site earned an estimated $40 million from the few things it charges for. That’s about 12 percent of the $326 million by which classified ad revenue declined that year.

    But free is not quite as simple — or as stupid — as it sounds. Just because products are free doesn’t mean that someone, somewhere, isn’t making huge gobs of money. Google is the prime example of this. The monetary benefits of craigslist are enormous as well, but they’re distributed among its tens of thousands of users rather than funneled straight to Craig Newmark Inc. To follow the money, you have to shift from a basic view of a market as a matching of two parties — buyers and sellers — to a broader sense of an ecosystem with many parties, only some of which exchange cash.

    The most common of the economies built around free is the three-party system. Here a third party pays to participate in a market created by a free exchange between the first two parties. Sound complicated? You’re probably experiencing it right now. It’s the basis of virtually all media.

    In the traditional media model, a publisher provides a product free (or nearly free) to consumers, and advertisers pay to ride along. Radio is “free to air,” and so is much of television. Likewise, newspaper and magazine publishers don’t charge readers anything close to the actual cost of creating, printing, and distributing their products. They’re not selling papers and magazines to readers, they’re selling readers to advertisers. It’s a three-way market.

    In a sense, what the Web represents is the extension of the media business model to industries of all sorts. This is not simply the notion that advertising will pay for everything. There are dozens of ways that media companies make money around free content, from selling information about consumers to brand licensing, “value-added” subscriptions, and direct ecommerce (see How-To Wiki for a complete list). Now an entire ecosystem of Web companies is growing up around the same set of models.

    Between new ways companies have found to subsidize products and the falling cost of doing business in a digital age, the opportunities to adopt a free business model of some sort have never been greater. But which one? And how many are there? Probably hundreds, but the priceless economy can be broken down into six broad categories:

    · “Freemium”
    What’s free: Web software and services, some content. Free to whom: users of the basic version.

    This term, coined by venture capitalist Fred Wilson, is the basis of the subscription model of media and is one of the most common Web business models. It can take a range of forms: varying tiers of content, from free to expensive, or a premium “pro” version of some site or software with more features than the free version (think Flickr and the $25-a-year Flickr Pro).

    Again, this sounds familiar. Isn’t it just the free sample model found everywhere from perfume counters to street corners? Yes, but with a pretty significant twist. The traditional free sample is the promotional candy bar handout or the diapers mailed to a new mother. Since these samples have real costs, the manufacturer gives away only a tiny quantity — hoping to hook consumers and stimulate demand for many more.

    Photo Illustration: Jeff Mermelstein

    But for digital products, this ratio of free to paid is reversed. A typical online site follows the 1 Percent Rule — 1 percent of users support all the rest. In the freemium model, that means for every user who pays for the premium version of the site, 99 others get the basic free version. The reason this works is that the cost of serving the 99 percent is close enough to zero to call it nothing.

    · Advertising
    What’s free: content, services, software, and more. Free to whom: everyone.

    Broadcast commercials and print display ads have given way to a blizzard of new Web-based ad formats: Yahoo’s pay-per-pageview banners, Google’s pay-per-click text ads, Amazon’s pay-per-transaction “affiliate ads,” and site sponsorships were just the start. Then came the next wave: paid inclusion in search results, paid listing in information services, and lead generation, where a third party pays for the names of people interested in a certain subject. Now companies are trying everything from product placement (PayPerPost) to pay-per-connection on social networks like Facebook. All of these approaches are based on the principle that free offerings build audiences with distinct interests and expressed needs that advertisers will pay to reach.

    · Cross-subsidies
    What’s free: any product that entices you to pay for something else. Free to whom: everyone willing to pay eventually, one way or another.

    When Wal-Mart charges $15 for a new hit DVD, it’s a loss leader. The company is offering the DVD below cost to lure you into the store, where it hopes to sell you a washing machine at a profit. Expensive wine subsidizes food in a restaurant, and the original “free lunch” was a gratis meal for anyone who ordered at least one beer in San Francisco saloons in the late 1800s. In any package of products and services, from banking to mobile calling plans, the price of each individual component is often determined by psychology, not cost. Your cell phone company may not make money on your monthly minutes — it keeps that fee low because it knows that’s the first thing you look at when picking a carrier — but your monthly voicemail fee is pure profit.

    On a busy corner in São Paulo, Brazil, street vendors pitch the latest “tecnobrega” CDs, including one by a hot band called Banda Calypso. Like CDs from most street vendors, these did not come from a record label. But neither are they illicit. They came directly from the band. Calypso distributes masters of its CDs and CD liner art to street vendor networks in towns it plans to tour, with full agreement that the vendors will copy the CDs, sell them, and keep all the money. That’s OK, because selling discs isn’t Calypso’s main source of income. The band is really in the performance business — and business is good. Traveling from town to town this way, preceded by a wave of supercheap CDs, Calypso has filled its shows and paid for a private jet.

    The vendors generate literal street cred in each town Calypso visits, and its omnipresence in the urban soundscape means that it gets huge crowds to its rave/dj/concert events. Free music is just publicity for a far more lucrative tour business. Nobody thinks of this as piracy.

    · Zero marginal cost
    What’s free: things that can be distributed without an appreciable cost to anyone. Free to whom: everyone.

    This describes nothing so well as online music. Between digital reproduction and peer-to-peer distribution, the real cost of distributing music has truly hit bottom. This is a case where the product has become free because of sheer economic gravity, with or without a business model. That force is so powerful that laws, guilt trips, DRM, and every other barrier to piracy the labels can think of have failed. Some artists give away their music online as a way of marketing concerts, merchandise, licensing, and other paid fare. But others have simply accepted that, for them, music is not a moneymaking business. It’s something they do for other reasons, from fun to creative expression. Which, of course, has always been true for most musicians anyway.

    · Labor exchange
    What’s free: Web sites and services. Free to whom: all users, since the act of using these sites and services actually creates something of value.

    You can get free porn if you solve a few captchas, those scrambled text boxes used to block bots. What you’re actually doing is giving answers to a bot used by spammers to gain access to other sites — which is worth more to them than the bandwidth you’ll consume browsing images. Likewise for rating stories on Digg, voting on Yahoo Answers, or using Google’s 411 service (see “How Can Directory Assistance Be Free?”). In each case, the act of using the service creates something of value, either improving the service itself or creating information that can be useful somewhere else.

    · Gift economy
    What’s free: the whole enchilada, be it open source software or user-generated content. Free to whom: everyone.

    From Freecycle (free secondhand goods for anyone who will take them away) to Wikipedia, we are discovering that money isn’t the only motivator. Altruism has always existed, but the Web gives it a platform where the actions of individuals can have global impact. In a sense, zero-cost distribution has turned sharing into an industry. In the monetary economy it all looks free — indeed, in the monetary economy it looks like unfair competition — but that says more about our shortsighted ways of measuring value than it does about the worth of what’s created.

    Enabled by the miracle of abundance, digital economics has turned traditional economics upside down. Read your college textbook and it’s likely to define economics as “the social science of choice under scarcity.” The entire field is built on studying trade-offs and how they’re made. Milton Friedman himself reminded us time and time again that “there’s no such thing as a free lunch.

    “But Friedman was wrong in two ways. First, a free lunch doesn’t necessarily mean the food is being given away or that you’ll pay for it later — it could just mean someone else is picking up the tab. Second, in the digital realm, as we’ve seen, the main feedstocks of the information economy — storage, processing power, and bandwidth — are getting cheaper by the day. Two of the main scarcity functions of traditional economics — the marginal costs of manufacturing and distribution — are rushing headlong to zip. It’s as if the restaurant suddenly didn’t have to pay any food or labor costs for that lunch.

    Surely economics has something to say about that?

    It does. The word is externalities, a concept that holds that money is not the only scarcity in the world. Chief among the others are your time and respect, two factors that we’ve always known about but have only recently been able to measure properly. The “attention economy” and “reputation economy” are too fuzzy to merit an academic department, but there’s something real at the heart of both. Thanks to Google, we now have a handy way to convert from reputation (PageRank) to attention (traffic) to money (ads). Anything you can consistently convert to cash is a form of currency itself, and Google plays the role of central banker for these new economies.

    There is, presumably, a limited supply of reputation and attention in the world at any point in time. These are the new scarcities — and the world of free exists mostly to acquire these valuable assets for the sake of a business model to be identified later. Free shifts the economy from a focus on only that which can be quantified in dollars and cents to a more realistic accounting of all the things we truly value today.

    Between digital economics and the wholesale embrace of King’s Gillette’s experiment in price shifting, we are entering an era when free will be seen as the norm, not an anomaly. How big a deal is that? Well, consider this analogy: In 1954, at the dawn of nuclear power, Lewis Strauss, head of the Atomic Energy Commission, promised that we were entering an age when electricity would be “too cheap to meter.” Needless to say, that didn’t happen, mostly because the risks of nuclear energy hugely increased its costs. But what if he’d been right? What if electricity had in fact become virtually free?The answer is that everything electricity touched — which is to say just about everything — would have been transformed. Rather than balance electricity against other energy sources, we’d use electricity for as many things as we could — we’d waste it, in fact, because it would be too cheap to worry about.

    All buildings would be electrically heated, never mind the thermal conversion rate. We’d all be driving electric cars (free electricity would be incentive enough to develop the efficient battery technology to store it). Massive desalination plants would turn seawater into all the freshwater anyone could want, irrigating vast inland swaths and turning deserts into fertile acres, many of them making biofuels as a cheaper store of energy than batteries. Relative to free electrons, fossil fuels would be seen as ludicrously expensive and dirty, and so carbon emissions would plummet. The phrase “global warming” would have never entered the language.

    Today it’s digital technologies, not electricity, that have become too cheap to meter. It took decades to shake off the assumption that computing was supposed to be rationed for the few, and we’re only now starting to liberate bandwidth and storage from the same poverty of imagination. But a generation raised on the free Web is coming of age, and they will find entirely new ways to embrace waste, transforming the world in the process. Because free is what you want — and free, increasingly, is what you’re going to get.

    Chris Anderson (canderson@wired.com) is the editor in chief of Wired and author of The Long Tail. His next book, FREE, will be published in 2009 by Hyperion.

    Search Wired

    Top Stories Magazine Wired Blogs All Wired

    Related Topics:

    Comments (63)

    Posted by: danielu23 hours ago1 Point
    Your “Scenario 1” implies you know absolutely NOTHING about the movie business. Distributors and Studios make the money on ticket sales based on a percentage split with the projection houses. The bulk of ticket sales money goes to the Distributors a…
    Posted by: tom2032 days ago1 Point
    The information is not free, it is being paid for (in cash) mostly by advertisers trying to gain the attention of the website visitors. It is also paid for (in time wasted) by the people who are constantly distracted by the ads. Micro-payments were…
    Posted by: mfouts2 days ago1 Point
    That article is absolutley amazing!!! I am currently into buying real estate and I am slowly transitioning into the great world wide web. @ of my partners and I are trying to take advantage of the the www world via http://www.choiceisfreedom.com still under…
    Posted by: foofah2 days ago1 Point
    Great article…but give poor Zeno a break. “The answer is that you can’t [reach the wall]: Once you’re within a few nanometers, atomic repulsion forces become too strong for you to get any closer.” You’ve either missed Zeno’s point entirely, or you’…
    Posted by: RainerGamer2 days ago1 Point
    Sign me up.
    Posted by: Lord_Jim2 days ago1 Point
    Is something really free only because you don’t pay in dollars? What about being bombarded with advertising? What about giving away personal data to dubious parties? What about costly ‘upgrade options’ hidden behind every second button of allegedly …
    Posted by: gdavis951293 days ago1 Point
    Please Mr. Anderson, buy yourself a dictionary. You write: …Yahoo announced that Yahoo Mail… would provide unlimited storage. Just in case that wasn’t totally clear, that’s “unlimited” as in “infinite”. ‘Unlimited’ means that Yahoo will not cap t…
    Posted by: MikeG3 days ago1 Point
    A few months ago I began researching free training & education. To be honest, I didn’t expect to find many good, free items, since I know that it takes time and effort (and time is money) to develop training. But I hoped my efforts would unearth …
    Posted by: RAGZ3 days ago1 Point
    You know, I subscribe to Wired, and I like the content, but please answer this question; why am I paying Wired’s comparatively high subscription cost if you’re going to stuff it so full of little ad inserts, that when I open it during my bathroom rit…
    Posted by: jdwright103 days ago1 Point
    This definitely true. It’s a pretty good strategy if you think about it. I just bought a $7 Gillette razor and the refill blades cost me $15!
    Posted by: gdavis951293 days ago1 Point

    Read Full Post »

    Older Posts »

    %d bloggers like this: