Posts Tagged ‘Startup’

Google: “We’re Not Doing a Good Job with Structured Data”

Written by Sarah Perez / February 2, 2009 7:32 AM / 9 Comments

During a talk at the New England Database Day conference at the Massachusetts Institute of Technology, Google’s Alon Halevy admitted that the search giant has “not been doing a good job” presenting the structured data found on the web to its users. By “structured data,” Halevy was referring to the databases of the “deep web” – those internet resources that sit behind forms and site-specific search boxes, unable to be indexed through passive means.

Google’s Deep Web Search

Halevy, who heads the “Deep Web” search initiative at Google, described the “Shallow Web” as containing about 5 million web pages while the “Deep Web” is estimated to be 500 times the size. This hidden web is currently being indexed in part by Google’s automated systems that submit queries to various databases, retrieving the content found for indexing. In addition to that aspect of the Deep Web – dubbed “vertical searching” – Halevy also referenced two other types of Deep Web Search: semantic search and product search.

Google wants to also be able to retrieve the data found in structured tables on the web, said Halevy, citing a table on a page listing the U.S. presidents as an example. There are 14 billion such tables on the web, and, after filtering, about 154 million of them are interesting enough to be worth indexing.

Can Google Dig into the Deep Web?

The question that remains is whether or not Google’s current search engine technology is going to be adept at doing all the different types of Deep Web indexing or if they will need to come up with something new. As of now, Google uses the Big Table database and MapReduce framework for everything search related, notes Alex Esterkin, Chief Architect at Infobright, Inc., a company delivering open source data warehousing solutions. During the talk, Halevy listed a number of analytical database application challenges that Google is currently dealing with: schema auto-complete, synonym discovery, creating entity lists, association between instances and aspects, and data level synonyms discovery. These challenges are addressed by Infobright’s technology, said Esterkin, but “Google will have to solve these problems the hard way.”

Also mentioned during the speech was how Google plans to organize “aspects” of search queries. The company wants to be able to separate exploratory queries (e.g., “Vietnam travel”) from ones where a user is in search of a particular fact (“Vietnam population”). The former query should deliver information about visa requirements, weather and tour packages, etc. In a way, this is like what the search service offered by Kosmix is doing. But Google wants to go further, said Halevy. “Kosmix will give you an ‘aspect,’ but it’s attached to an information source. In our case, all the aspects might be just Web search results, but we’d organize them differently.”

Yahoo Working on Similar Structured Data Retrieval

The challenges facing Google today are also being addressed by their nearest competitor in search, Yahoo. In December, Yahoo announced that they were taking their SearchMonkey technology in-house to automate the extraction of structured information from large classes of web sites. The results of that in-house extraction technique will allow Yahoo to augment their Yahoo Search results with key information returned alongside the URLs.

In this aspect of web search, it’s clear that no single company has yet to dominate. However, even if a non-Google company surges ahead, it may not be enough to get people to switch engines. Today, “Google” has become synonymous with web search, just like “Kleenex” is a tissue, “Band-Aid” is an adhesive bandage, and “Xerox” is a way to make photocopies. Once that psychological mark has been made into our collective psyches and the habit formed, people tend to stick with what they know, regardless of who does it better. That’s something that’s a bit troublesome – if better search technology for indexing the Deep Web comes into existence outside of Google, the world may not end up using it until such point Google either duplicates or acquires the invention.

Still, it’s far too soon to write Google off yet. They clearly have a lead when it comes to search and that came from hard work, incredibly smart people, and innovative technical achievements. No doubt they can figure out this Deep Web thing, too. (We hope).

Read Full Post »

2009 Predictions and Recommendations for Web 2.0 and Social Networks

Christopher Rollyson

Volatility, Uncertainly and Opportunity—Move Crisply while Competitors Are in Disarray

Now that the Year in Review 2008 has summarized key trends, we are in excellent position for 2009 prognostications, so welcome to Part II. As all experienced executives know, risk and reward are inseparable twins, and periods of disruption elevate both, so you will have much more opportunity to produce uncommon value than normal.

This is a high-stakes year in which we can expect surprises. Web 2.0 and social networks can help because they increase flexibility and adaptiveness. Alas, those who succeed will have to challenge conventional thinking considerably, which is not a trivial exercise in normal times. The volatility that many businesses face will make it more difficult because many of their clients and/or employees will be distracted. It will also make it easier because some of them will perceive that extensive change is afoot, and Web 2.0 will blend in with the cacaphony. Disruption produces unusual changes in markets, and the people that perceive the new patterns and react appropriately emerge as new leaders.

2009 Predictions

These are too diverse to be ranked in any particular order. Please share your reactions and contribute those that I have missed.

  1. The global financial crisis will continue to add significant uncertainty in the global economy in 2009 and probably beyond. I have no scientific basis for this, but there are excellent experts of every flavor on the subject, so take your pick. I believe that we are off the map, and anyone who says that he’s sure of a certain outcome should be considered with a healthy skepticism.
    • All I can say is my friends, clients and sources in investment and commercial banking tell me it’s not over yet, and uncertainty is the only certainty until further notice. This has not yet been fully leeched.
    • Western governments, led the the U.S., are probably prolonging the pain because governments usually get bailouts wrong. However, voters don’t have the stomachs for hardship, so we are probably trading short-term “feel good” efforts for a prolonged adjustment period.
  2. Widespread social media success stories in 2009 in the most easily measurable areas such as talent management, business development, R&D and marketing.
    • 2008 saw a significant increase in enterprise executives’ experimentation with LinkedIn, Facebook, YouTube and enterprise (internal) social networks. These will begin to bear fruit in 2009, after which a “mad rush of adoption” will ensue.
    • People who delay adoption will pay dearly in terms of consulting fees, delayed staff training and retarded results.
  3. Internal social networks will largely disappoint. Similar to intranets, they will produce value, but few enterprises are viable long-term without seamlessly engaging the burgeoning external world of experts.
    In general, the larger and more disparate an organization’s audience
    is, the more value it can create, but culture must encourage emergent, cross-boundary connections, which is where many organizations fall down.


  • If you’re a CIO who’s banking heavily on your behind-the-firewall implementation, just be aware that you need to engage externally as well.
  • Do it fast because education takes longer than you think.
  • There are always more smart people outside than inside any organization.
  • Significant consolidation among white label social network vendors, so use your usual customary caution when signing up partners.
    • Due diligence and skill portability will help you to mitigate risks. Any vendor worth their salt will use standardized SOA-friendly architecture and feature sets. As I wrote last year, Web 2.0 is not your father’s software, so focus on people and process more than technology.
    • If your vendor hopeful imposes process on your people, run.
  • No extensive M&A among big branded sites like Facebook, LinkedIn and Twitter although there will probably be some. The concept of the social ecosystem holds that nodes on pervasive networks can add value individually. LinkedIn and Facebook have completely different social contexts. “Traditional” executives tend to view disruptions as “the new thing” that they want to put into a bucket (”let them all buy each other, so I only have to learn one!”). Wrong. This is the new human nervous system, and online social venues, like their offline counterparts, want specificity because they add more value that way. People hack together the networks to which they belong based on their goals and interests.
    • LinkedIn is very focused on the executive environment, and they will not buy Facebook or Twitter. They might buy a smaller company. They are focused on building an executive collaboration platform, and a large acquisition would threaten their focus. LinkedIn is in the initial part of its value curve, they have significant cash, and they’re profitable. Their VCs can smell big money down the road, so they won’t sell this year.
    • Twitter already turned down Facebook, and my conversations with them lead me to believe that they love their company; and its value is largely undiscovered as of yet. They will hold out as long as they can.
    • Facebook has staying power past 2009. They don’t need to buy anyone of import; they are gaining global market share at a fast clip. They already enable customers to build a large part of the Facebook experience, and they have significant room to innovate. Yes, there is a backlash in some quarters against their size. I don’t know Mark Zuckerberg personally, and I don’t have a feeling for his personal goals.
    • I was sad to see that Dow Jones sold out to NewsCorp and, as a long-time Wall Street Journal subscriber, I am even more dismayed now. This will prove a quintessential example of value destruction. The Financial Times currently fields a much better offering. The WSJ is beginning to look like MySpace! As for MySpace itself, I don’t have a firm bead on it but surmise that it has a higher probability of major M&A than the aforementioned: its growth has stalled, Facebook continues to gain, and Facebook uses more Web 2.0 processes, so I believe it will surpass MySpace in terms of global audience.
    • In being completely dominant, Google is the Wal-Mart of Web 2.0, and I don’t have much visibility into their plans, but I think they could make significant waves in 2009. They are very focused on applying search innovation to video, which is still in the initial stages of adoption, so YouTube is not going anywhere.
    • I am less familiar with Digg, Xing, Bebo, Cyworld. Of course, Orkut is part of the Googleverse.
  • Significant social media use by the Obama Administration. It has the knowledge, experience and support base to pursue fairly radical change. Moreover, the degree of change will be in synch with the economy: if there is a significant worsening, expect the government to engage people to do uncommon things.
    • Change.gov is the first phase in which supporters or any interested person is invited to contribute thoughts, stories and documents to the transition team. It aims to keep people engaged and to serve the government on a volunteer basis
    • The old way of doing things was to hand out form letters that you would mail to your representative. Using Web 2.0, people can organize almost instantly, and results are visible in real-time. Since people are increasingly online somewhere, the Administration will invite them from within their favorite venue (MySpace, Facebook…).
    • Obama has learned that volunteering provides people with a sense of meaning and importance. Many volunteers become evangelists.
  • Increasing citizen activism against companies and agencies, a disquieting prospect but one that I would not omit from your scenario planning (ask yourself, “How could people come together and magnify some of our blemishes?” more here). To whit:
    • In 2007, an electronic petition opposing pay-per-use road tolls in the UK reached 1.8 million signatories, stalling a major government initiative. Although this did not primarily employ social media, it is indicative of the phenomenon.
    • In Q4 2008, numerous citizen groups organized Facebook groups (25,000 signatures in a very short time) to oppose television and radio taxes, alarming the Swiss government. Citizens are organizing to stop paying obligatory taxes—and to abolish the agency that administers the tax system. Another citizen initiative recently launched on the Internet collected 60,000 signatures to oppose biometric passports. German links. French links.
    • In the most audacious case, Ahmed Maher is using Facebook to try to topple the government of Egypt. According to Wired’s Cairo Activists Use Facebook to Rattle Regime, activists have organized several large demonstrations and have a Facebook group of 70,000 that’s growing fast.
  • Executive employment will continue to feel pressure, and job searches will get increasingly difficult for many, especially those with “traditional” jobs that depend on Industrial Economy organization.
    • In tandem with this, there will be more opportunities for people who can “free-agent” themselves in some form.
    • In 2009, an increasing portion of executives will have success at using social networks to diminish their business development costs, and their lead will subsequently accelerate the leeching of enterprises’ best and brightest, many of whom could have more flexibility and better pay as independents. This is already manifest as displaced executives choose never to go back.
    • The enterprise will continue to unbundle. I have covered this extensively on the Transourcing website.
  • Enterprise clients will start asking for “strategy” to synchronize social media initiatives. Web 2.0 is following the classic adoption pattern: thus far, most enterprises have been using a skunk works approach to their social media initiatives, or they’ve been paying their agencies to learn while delivering services.
    • In the next phase, beginning in 2009, CMOs, CTOs and CIOs will sponsor enterprise level initiatives, which will kick off executive learning and begin enterprise development of social media native skills. After 1-2 years of this, social media will be spearheaded by VPs and directors.
    • Professional services firms (PwC, KPMG, Deloitte..) will begin scrambling to pull together advisory practices after several of their clients ask for strategy help. These firms’ high costs do not permit them to build significantly ahead of demand.
    • Marketing and ad agencies (Leo Burnett, Digitas…) will also be asked for strategy help, but they will be hampered by their desires to maintain the outsourced model; social media is not marketing, even though it will displace certain types of marketing.
    • Strategy houses (McKinsey, BCG, Booz Allen…) will also be confronted by clients asking for social media strategy; their issue will be that it is difficult to quantify, and the implementation piece is not in their comfort zone, reducing revenue per client.
    • Boutiques will emerge to develop seamless strategy and implementation for social networks. This is needed because Web 2.0 and social networks programs involve strategy, but implementation involves little technology when compared to Web 1.0. As I’ll discuss in an imminent article, it will involve much more interpersonal mentoring and program development.
  • Corporate spending on Enterprise 2.0 will be very conservative, and pureplay and white label vendors (and consultants) will need to have strong business cases.
    • CIOs have better things to spend money on, and they are usually reacting to business unit executives who are still getting their arms around the value of Web 2.0, social networks and social media.
    • Enterprise software vendors will release significant Web 2.0 bolt-on improvements to their platforms in 2009. IBM is arguably out in front with Lotus Connections, with Microsoft Sharepoint fielding a solid solution. SAP and Oracle will field more robust solutions this year.
  • The financial crunch will accelerate social network adoption among those focused on substance rather than flash; this is akin to the dotbomb from 2001-2004, no one wanted to do the Web as an end in itself anymore; it flushed out the fluffy offers (and well as some really good ones).
    • Social media can save money.. how much did it cost the Obama campaign in time and money to raise $500 million? Extremely little.
    • People like to get involved and contribute, when you can frame the activity as important and you provide the tools to facilitate meaningful action. Engagement raises profits and can decrease costs. Engaged customers, for example, tend to leave less often than apathetic customers.
    • Social media is usually about engaging volunteer contributors; if you get it right, you will get a lot of help for little cash outlay.
    • Social media presents many new possibilities for revenue, but to see them, look outside existing product silos. Focus on customer experience by engaging customers, not with your organization, but with each other. Customer-customer communication is excellent for learning about experience.
  • Microblogging will completely mainstream even though Twitter is still quite emergent and few solid business cases exist.
    • Twitter (also Plurk, Jaiku, Pownce {just bought by Six Apart and closed}, Kwippy, Tumblr) are unique for two reasons: they incorporate mobility seamlessly, and they chunk communications small; this leads to a great diversity of “usage context”
    • Note that Dell sold $1 million on Twitter in 2008, using it as a channel for existing business.
    • In many businesses, customers will begin expecting your organization to be on Twitter; this year it will rapidly cease to be a novelty.

    2009 Recommendations

    Web 2.0 will affect business and culture far more than Web 1.0 (the internet), which was about real-time information access and transactions via a standards-based network and interface. Web 2.0 enables real-time knowledge and relationships, so it will profoundly affect most organizations’ stakeholders (clients, customers, regulators, employees, directors, investors, the public…). It will change how all types of buying decisions are made.

    As an individual and/or an organization leader, you have the opportunity to adopt more quickly than your peers and increase your relevance to stakeholders as their Web 2.0 expectations of you increase. 2009 will be a year of significant adoption, and I have kept this list short, general and actionable. I have assumed that your organization has been experimenting with various aspects of Web 2.0, that some people have moderate experience. Please feel free to contact me if you would like more specific or advanced information or suggestions. Recommendations are ranked in importance, the most critical at the top.

    1. What: Audit your organization’s Web 2.0 ecosystem, and conduct your readiness assessment. Why: Do this to act with purpose, mature your efforts past experimentation and increase your returns on investment.
      • The ecosystem audit will tell you what stakeholders are doing, and in what venues. Moreover, a good one will tell you trends, not just numbers. In times of rapid adoption, knowing trends is critical, so you can predict the future. Here’s more about audits.
      • The readiness assessment will help you to understand how your value proposition and resources align with creating and maintaining online relationships. The audit has told you what stakeholders are doing, now you need to assess what you can do to engage them on an ongoing basis. Here’s more about readiness assessments.
    2. What: Select a top executive to lead your organization’s adoption of Web 2.0 and social networks. Why: Web 2.0 is changing how people interact, and your organizational competence will be affected considerably, so applying it to your career and business is very important.
      • This CxO should be someone with a track record for innovation and a commitment to leading discontinuous change. Should be philosophically in synch with the idea of emergent organization and cross-boundary collaboration.
      • S/He will coordinate your creation of strategy and programs (part-time). This includes formalizing your Web 2.0 policy, legal and security due diligence.
    3. What: Use an iterative portfolio approach to pursue social media initiatives in several areas of your business, and chunk investments small.
      Why: Both iteration and portfolio approaches help you to manage risk and increase returns.
    • Use the results of the audit and the readiness assessment to help you to select the stakeholders you want to engage.
    • Engage a critical mass of stakeholders about things that inspire or irritate them and that you can help them with.
    • All else equal, pilots should include several types of Web 2.0 venues and modes like blogs, big branded networks (Facebook, MySpace), microblogs (Twitter), video and audio.
    • As a general rule, extensive opportunity exists where you can use social media to cross boundaries, which usually impose high costs and prevent collaboration. One of the most interesting in 2009 will be encouraging alumni, employees and recruits to connect and collaborate according to their specific business interests. This can significantly reduce your organization’s business development, sales and talent acquisition costs. For more insight to this, see Alumni 2.0.
    • Don’t overlook pilots with multiple returns, like profile management programs, which can reduce your talent acquisition and business development costs. Here’s more on profile management.


  • What: Create a Web 2.0 community with numerous roles to enable employees flexibility.
    Why: You want to keep investments small and let the most motivated employees step forward.

    • Roles should include volunteers for pilots, mentors (resident bloggers, video producers and others), community builders (rapidly codify the knowledge you are gathering from pilots), some part-time more formal roles. Perhaps a full-time person to coordinate would make sense. Roles can be progressive and intermittent. Think of this as open source.
    • To stimulate involvement, the program must be meaningful, and it must be structured to minimize conflicts with other responsibilities.
  • What: Avoid the proclivity to treat Web 2.0 as a technology initiative. Why: Web 1.0 (the Internet) involved more of IT than does Web 2.0, and many people are conditioned to think that IT drives innovation; they fall in the tech trap, select tools first and impose process. This is old school and unnecessary because the tools are far more flexible than the last generation software with which many are still familiar.
    • People create the value when they get involved, and technology often gets in the way by making investments in tools that impose process on people and turn them off. Web 2.0 tools impose far less process on people.
    • More important than what brand you invest in is your focus on social network processes and how they add value to existing business processes. If you adopt smartly, you will be able to transfer assets and processes elsewhere while minimizing disruption. More likely is that some brands will disappear (Pownce closed its doors 15 December). When you focus your organization on mastering process and you distribute learning, you will be more flexible with the tools.
    • Focus on process and people, and incent people to gather and share knowledge and help each other. This will increase your flexibility with tools.
  • What: Manage consulting, marketing and technology partners with a portfolio strategy. Why: Maximize flexibility and minimize risk.
    • From the technology point of view, there are three main vendor flavors: enterprise bolt-on (i.e. Lotus Connections), pureplay white label vendors (SmallWorldLabs) and open (Facebook, LinkedIn). As a group, pureplays have the most diversity in terms of business models, and the most uncertainty. Enterprise bolt-ons’ biggest risk is that they lag significantly behind. More comparisons here.
    • Fight the urge to go with one. If you’re serious about getting business value, you need to be in the open cross-boundary networks. If you have a Lotus or Microsoft relationship, compare Connections and Sharepoint with some pureplays to address private social network needs. An excellent way to start could be with Yammer.
    • Be careful when working with consulting- and marketing-oriented partners who are accustomed to an outsourced model. Web 2.0 is not marketing; it is communicating to form relationships and collaborate online. It does have extensive marketing applications; make sure partners have demonstrated processes for mentoring because Web 2.0 will be a core capability for knowledge-based organizations, and you need to build your resident knowledge.
  • Parting Shots

    I hope you find these thoughts useful, and I encourage you to add your insights and reactions as comments. If you have additional questions about how to use Web 2.0, please feel free to contact me. I wish all the best to you in 2009.

    Read Full Post »

    Evolving Trends

    Wikipedia 3.0: The End of Google?

    In Uncategorized on June 26, 2006 at 5:18 am

    Author: Marc Fawzi

    License: Attribution-NonCommercial-ShareAlike 3.0


    Semantic Web Developers:

    Feb 5, ‘07: The following external reference concerns the use of rule-based inference engines and ontologies in implementing the Semantic Web + AI vision (aka Web 3.0):

    1. Description Logic Programs: Combining Logic Programs with Description Logic (note: there are better, simpler ways of achieving the same purpose.)

    Click here for more info and a list of related articles…


    Two years after I published this article it has received over 200,000 hits and we now have several startups attempting to apply Semantic Web technology to Wikipedia and knowledge wikis in general, including Wikipedia founder’s own commercial startup as well as a startup that was recently purchased by Microsoft.

    Recently, after seeing how Wikipedia’s governance is so flawed, I decided to write about a way to decentralize and democratize Wikipedia.

    Versión española


    (Article was last updated at 10:15am EST, July 3, 2006)

    Wikipedia 3.0: The End of Google?


    The Semantic Web (or Web 3.0) promises to “organize the world’s information” in a dramatically more logical way than Google can ever achieve with their current engine design. This is specially true from the point of view of machine comprehension as opposed to human comprehension.The Semantic Web requires the use of a declarative ontological language like OWL to produce domain-specific ontologies that machines can use to reason about information and make new conclusions, not simply match keywords.

    However, the Semantic Web, which is still in a development phase where researchers are trying to define the best and most usable design models, would require the participation of thousands of knowledgeable people over time to produce those domain-specific ontologies necessary for its functioning.

    Machines (or machine-based reasoning, aka AI software or ‘info agents’) would then be able to use those laboriously –but not entirely manually– constructed ontologies to build a view (or formal model) of how the individual terms within the information relate to each other. Those relationships can be thought of as the axioms (assumed starting truths), which together with the rules governing the inference process both enable as well as constrain the interpretation (and well-formed use) of those terms by the info agents to reason new conclusions based on existing information, i.e. to think. In other words, theorems (formal deductive propositions that are provable based on the axioms and the rules of inference) may be generated by the software, thus allowing formal deductive reasoning at the machine level. And given that an ontology, as described here, is a statement of Logic Theory, two or more independent info agents processing the same domain-specific ontology will be able to collaborate and deduce an answer to a query, without being driven by the same software.

    Thus, and as stated, in the Semantic Web individual machine-based agents (or a collaborating group of agents) will be able to understand and use information by translating concepts and deducing new information rather than just matching keywords.

    Once machines can understand and use information, using a standard ontology language, the world will never be the same. It will be possible to have an info agent (or many info agents) among your virtual AI-enhanced workforce each having access to different domain specific comprehension space and all communicating with each other to build a collective consciousness.

    You’ll be able to ask your info agent or agents to find you the nearest restaurant that serves Italian cuisine, even if the restaurant nearest you advertises itself as a Pizza joint as opposed to an Italian restaurant. But that is just a very simple example of the deductive reasoning machines will be able to perform on information they have.

    Far more awesome implications can be seen when you consider that every area of human knowledge will be automatically within the comprehension space of your info agents. That is because each info agent can communicate with other info agents who are specialized in different domains of knowledge to produce a collective consciousness (using the Borg metaphor) that encompasses all human knowledge. The collective “mind” of those agents-as-the-Borg will be the Ultimate Answer Machine, easily displacing Google from this position, which it does not truly fulfill.

    The problem with the Semantic Web, besides that researchers are still debating which design and implementation of the ontology language model (and associated technologies) is the best and most usable, is that it would take thousands or tens of thousands of knowledgeable people many years to boil down human knowledge to domain specific ontologies.

    However, if we were at some point to take the Wikipedia community and give them the right tools and standards to work with (whether existing or to be developed in the future), which would make it possible for reasonably skilled individuals to help reduce human knowledge to domain-specific ontologies, then that time can be shortened to just a few years, and possibly to as little as two years.

    The emergence of a Wikipedia 3.0 (as in Web 3.0, aka Semantic Web) that is built on the Semantic Web model will herald the end of Google as the Ultimate Answer Machine. It will be replaced with “WikiMind” which will not be a mere search engine like Google is but a true Global Brain: a powerful pan-domain inference engine, with a vast set of ontologies (a la Wikipedia 3.0) covering all domains of human knowledge, that can reason and deduce answers instead of just throwing raw information at you using the outdated concept of a search engine.


    After writing the original post I found out that a modified version of the Wikipedia application, known as “Semantic” MediaWiki has already been used to implement ontologies. The name that they’ve chosen is Ontoworld. I think WikiMind would have been a cooler name, but I like ontoworld, too, as in “it descended onto the world,” since that may be seen as a reference to the global mind a Semantic-Web-enabled version of Wikipedia could lead to.

    Google’s search engine technology, which provides almost all of their revenue, could be made obsolete in the near future. That is unless they have access to Ontoworld or some such pan-domain semantic knowledge repository such that they tap into their ontologies and add inference capability to Google search to build formal deductive intelligence into Google.

    But so can Ask.com and MSN and Yahoo…

    I would really love to see more competition in this arena, not to see Google or any one company establish a huge lead over others.

    The question, to rephrase in Churchillian terms, is wether the combination of the Semantic Web and Wikipedia signals the beginning of the end for Google or the end of the beginning. Obviously, with tens of billions of dollars at stake in investors’ money, I would think that it is the latter. No one wants to see Google fail. There’s too much vested interest. However, I do want to see somebody out maneuver them (which can be done in my opinion.)


    Please note that Ontoworld, which currently implements the ontologies, is based on the “Wikipedia” application (also known as MediaWiki), but it is not the same as Wikipedia.org.

    Likewise, I expect Wikipedia.org will use their volunteer workforce to reduce the sum of human knowledge that has been entered into their database to domain-specific ontologies for the Semantic Web (aka Web 3.0) Hence, “Wikipedia 3.0.”

    Response to Readers’ Comments

    The argument I’ve made here is that Wikipedia has the volunteer resources to produce the needed Semantic Web ontologies for the domains of knowledge that it currently covers, while Google does not have those volunteer resources, which will make it reliant on Wikipedia.

    Those ontologies together with all the information on the Web, can be accessed by Google and others but Wikipedia will be in charge of the ontologies for the large set of knowledge domains they currently cover, and that is where I see the power shift.

    Google and other companies do not have the resources in man power (i.e. the thousands of volunteers Wikipedia has) who would help create those ontologies for the large set of knowledge domains that Wikipedia covers. Wikipedia does, and is positioned to do that better and more effectively than anyone else. Its hard to see how Google would be able create the ontologies for all domains of human knowledge (which are continuously growing in size and number) given how much work that would require. Wikipedia can cover more ground faster with their massive, dedicated force of knowledgeable volunteers.

    I believe that the party that will control the creation of the ontologies (i.e. Wikipedia) for the largest number of domains of human knowledge, and not the organization that simply accesses those ontologies (i.e. Google), will have a competitive advantage.

    There are many knowledge domains that Wikipedia does not cover. Google will have the edge there but only if people and organizations that produce the information also produce the ontologies on their own, so that Google can access them from its future Semantic Web engine. My belief is that it would happen but very slowly, and that Wikipedia can have the ontologies done for all the domain of knowledge that it currently covers much faster, and then they would have leverage by the fact that they would be in charge of those ontologies (aka the basic layer for AI enablement.)

    It still remains unclear, of course, whether the combination of Wikipedia and the Semantic Web herald the beginning of the end for Google or the end of the beginning. As I said in the original part of the post, I believe that it is the latter, and the question I pose in the title of this post, in this context, is not more than rhetorical. However, I could be wrong in my judgment and Google could fall behind Wikipedia as the world’s ultimate answer machine.

    After all, Wikipedia makes “us” count. Google doesn’t. Wikipedia derives its power from “us.” Google derives its power from its technology and inflated stock price. Who would you count on to change the world?

    Response to Basic Questions Raised by the Readers

    Reader divotdave asked a few questions, which I thought to be very basic in nature (i.e. important.) I believe more people will be pondering about the same issues, so I’m to including here them with the replies.

    How does it distinguish between good information and bad? How does it determine which parts of the sum of human knowledge to accept and which to reject?

    It wouldn’t have to distinguish between good vs bad information (not to be confused with well-formed vs badly formed) if it was to use a reliable source of information (with associated, reliable ontologies.) That is if the information or knowledge to be sought can be derived from Wikipedia 3.0 then it assumes that the information is reliable.

    However, with respect to connecting the dots when it comes to returning information or deducing answers from the sea of information that lies beyond Wikipedia then your question becomes very relevant. How would it distinguish good information from bad information so that it can produce good knowledge (aka comprehended information, aka new information produced through deductive reasoning based on exiting information.)

    Who, or what as the case may be, will determine what information is irrelevant to me as the inquiring end user?

    That is a good question and one which would have to be answered by the researchers working on AI engines for Web 3.0

    There will be assumptions made as to what you are inquiring about. Just as when I saw your question I had to make assumption about what you really meant to ask me, AI engines would have to make an assumption, pretty much based on the same cognitive process humans use, which is the topic of a separate post, but which has been covered by many AI researchers.

    Is this to say that ultimately some over-arching standard will emerge that all humanity will be forced (by lack of alternative information) to conform to?

    There is no need for one standard, except when it comes to the language the ontologies are written in (e.g OWL, OWL-DL, OWL Full etc.) Semantic Web researchers are trying to determine the best and most usable choice, taking into consideration human and machine performance in constructing –and exclusive in the latter case– interpreting those ontologies.

    Two or more info agents working with the same domain-specific ontology but having different software (different AI engines) can collaborate with each other.

    The only standard required is that of the ontology language and associated production tools.


    On AI and Natural Language Processing

    I believe that the first generation of AI that will be used by Web 3.0 (aka Semantic Web) will be based on relatively simple inference engines that will NOT attempt to perform natural language processing, where current approaches still face too many serious challenges. However, they will still have the formal deductive reasoning capabilities described earlier in this article, and users would interact with these systems through some query language.

    On the Debate about the Nature and Definition of AI

    The embedding of AI into cyberspace will be done at first with relatively simple inference engines (that use algorithms and heuristics) that work collaboratively in P2P fashion and use standardized ontologies. The massively parallel interactions between the hundreds of millions of AI Agents that will run within the millions of P2P AI Engines on users’ PCs will give rise to the very complex behavior that is the future global brain.


    1. Web 3.0 Update
    2. All About Web 3.0 <– list of all Web 3.0 articles on this site
    3. P2P 3.0: The People’s Google
    4. Reality as a Service (RaaS): The Case for GWorld <– 3D Web + Semantic Web + AI
    5. For Great Justice, Take Off Every Digg
    6. Google vs Web 3.0
    7. People-Hosted “P2P” Version of Wikipedia
    8. Beyond Google: The Road to a P2P Economy

    Update on how the Wikipedia 3.0 vision is spreading:

    Update on how Google is co-opting the Wikipedia 3.0 vision:

    Web 3D Fans:

    Here is the original Web 3D + Semantic Web + AI article:

    Web 3D + Semantic Web + AI *

    The above mentioned Web 3D + Semantic Web + AI vision which preceded the Wikipedia 3.0 vision received much less attention because it was not presented in a controversial manner. This fact was noted as the biggest flaw of social bookmarking site digg which was used to promote this article.

    Web 3.0 Developers:

    Feb 5, ‘07: The following external reference concerns the use of rule-based inference engines and ontologies in implementing the Semantic Web + AI vision (aka Web 3.0):

    1. Description Logic Programs: Combining Logic Programs with Description Logic (note: there are better, simpler ways of achieving the same purpose.)

    Jan 7, ‘07: The following Evolving Trends post discusses the current state of semantic search engines and ways to improve the paradigm:

    1. Designing a Better Web 3.0 Search Engine

    June 27, ‘06: Semantic MediaWiki project, enabling the insertion of semantic annotations (or metadata) into the content:

    1. http://semantic-mediawiki.org/wiki/Semantic_MediaWiki (see note on Wikia below)

    Wikipedia’s Founder and Web 3.0


    Read Full Post »

    Tech Biz  :  IT   

    Murdoch Calls Google, Yahoo Copyright Thieves — Is He Right?

    By David Kravets EmailApril 03, 2009 | 5:00:18 PMCategories: Intellectual Property  

    Murdoch_2 Rupert Murdoch, the owner of News Corp. and The Wall Street Journal, says Google and Yahoo are giant copyright scofflaws that steal the news.

    “The question is, should we be allowing Google to steal all our copyright … not steal, but take,” Murdoch says. “Not just them, but Yahoo.”

    But whether search-engine news aggregation is theft or a protected fair use under copyright law is unclear, even as Google and Yahoo profit tremendously from linking to news. So maybe Murdoch is right.

    Murdoch made his comments late Thursday during an address at the Cable Show, an industry event held in Washington. He seemingly was blaming the web, and search engines, for the news media’s ills.

    “People reading news for free on the web, that’s got to change,” he said.

    Real estate magnate Sam Zell made similar comments in 2007 when he took over the Tribune Company and ran it into bankruptcy.

    We suspect Zell and Murdoch are just blowing smoke. If they were not, perhaps they could demand Google and Yahoo remove their news content. The search engines would kindly oblige.

    Better yet, if Murdoch and Zell are so set on monetizing their web content, they should sue the search engines and claim copyright violations in a bid to get the engines to pay for the content.

    The outcome of such a lawsuit is far from clear.

    It’s unsettled whether search engines have a valid fair use claim under the Digital Millennium Copyright Act. The news headlines are copied verbatim, as are some of the snippets that go along.

    Fred von Lohmann of the Electronic Frontier Foundation points out that “There’s not a rock-solid ruling on the question.”

    Should the search engines pay up for the content? Tell us what you think.

    Read Full Post »

    Evolving Trends

    July 2, 2006

    Digg This! 55,500 hits in ~4 Days

    /* (this post was last updated at 10:30am EST, July 3, ‘06, GMT +5)

    This post is a follow up to the previous post For Great Justice, Take Off Every Digg

    According to Alexa.com, the total penetration of the Wikipedia 3.0 article was ~2 million readers (who must have read it on other websites that copied the article)


    EDIT: I looked at the graph and did the math again, and as far as I can tell it’s “55,500 in ~4 days” not “55,000 in 5 days.” So that’s 13,875 page views per each day.

    Stats (approx.) for the “Wikipedia 3.0: The End of Google?” and “For Great Justice, Take Off Every Digg articles:

    These are to the best of my memory from each of the first ~4 days as verified by the graph.

    33,000 page views in day 1 (the first wave)

    * day 1 is almost one and a half columns on the graph not one because I posted it at ~5:00am and the day (in WordPress time zone) ends at 8pm, so the first column is only ~ 15 hours.

    9,500 page views in day 2

    5,000 page views in day 3

    8,000 page views in day 4 (the second wave)

    Total: 55,500 in ~4 days which is 13,875 page views per day (not server hits) for ~4 days. Now on the 7th day the traffic is expected to be ~1000 page views, unless I get another small spike. That’s a pretty good double-dipping long tail. If you’ve done better with digg let me know how you did it! 🙂


    This post is a follow-up to my previous article on digg, where I explained how I had experimented and succeeded in generating 45,000 visits to an article I wrote in the first 3 days of its release (40,000 of which came directly from digg.)

    I had posted an article on digg about a bold but well-thought out vision of the future, involving Google and Wikipedia, with the sensational title of “Wikipedia 3.0: The End of Google?” (which may turn out after all to be a realistic proposition.)

    Since my previous article on digg I’ve found out that digg did not ban my IP address. They had deleted my account due to multiple submissions. So I was able to get back with a new user account and try another the experiment: I submitted “AI Matrix vs Google” and “Web 3.0 vs Google” as two separate links for one article (which has since been given the final title of “Web 3.0.” [July 12, ‘06, update: see P2P 3.0: The People’s Google)


    Neither ’sensational’ title worked.


    I tried to rationalize what happened …

    I figured that the crowd wanted a showdown between two major cults (e.g the Google fans and the Wikipedia fans) and not between Google and some hypothetical entity (e.g. AI Matrix or Web 3.0).

    But then I thought about how Valleywag was able to cleverly piggyback on my “Wikipedia 3.0: The End of Google?” article (which had generated all the hype) with an article having the dual title of “Five Reasons Google Will Invent Real AI” on digg and “Five Reasons No One Will Replace Google” on Valleywag.

    They used AI in the title and I did the same in the new experiment, so we should both get lots of diggs. They got about 1300 diggs. I got about 3. Why didn’t it work in my case?

    The answer is that the crowd is not a logical animal. It’s a psychological animal. It does not make mental connections as we do as individuals (because a crowd is a randomized population that is made up of different people at different times) so it can’t react logically.

    Analyzing it from the psychological frame, I concluded that it must have been the Wikipedia fans who “dugg” my original article. The Google fans did “digg” it but not in the same large percentage as the Wikipedia fans.

    Valleywag gave the Google fans the relief they needed after my article with its own article in defense of Google. However, when I went at it again with “Matrix AI vs Google” and “Web 3.0 vs Google” the error I made was in not knowing that the part of the crowd that “dugg” my original article were the Wikipedia fans not the Goolge haters. In fact, Google haters are not very well represented on digg. In other words, I found out that “XYZ vs Google” will not work on digg unless XYZ has a large base of fans on digg.

    Escape Velocity

    The critical threshold in the digg traffic generation process is to get enough diggs quickly enough, after submitting the post, to get the post on digg’s popular page. Once the post is on digg’s popular page both sides (those who like what your post is about and those who will hate you and want to kill you for writing it) will affected by the psychlogical manipulation you design (aka the ‘wave.’) However, the majority of those who will “digg” it will be from the group that likes it. A lesser number of people will “digg” it from the group that hates it.

    Double Dipping

    I did have a strong second wave when I went out and explained how ridiculous the whole digg process is.

    This is how the second wave was created:

    I got lots of “diggs” from Wikipedia fans and traffic from both Google and Wikipedia fans for the original article.

    Then I wrote a follow up on why “digg sucks” but only got 100 “diggs” for it (because all the digg fans on digg kept ‘burying’ it!) so I did not get much traffic to it from digg fans or digg haters (not that many of the latter on digg.)

    The biggest traffic to it came from the bloggers and others who came to see what the all fuss was about as far as the original article. I had linked to the follow up article (on why I thought digg sucked) from the original article (i.e. like chaining magnets) so when people came to see what the fuss was all about with respect to the original article they were also told to check out the “digg sucks” article for context.

    That worked! The original and second waves, which both had a long tail (see below) generated a total of 55,500 hits in ~4 days. That’s 13,875 page views a day for the first ~4 days.

    Long Tail vs Sting

    I know that some very observant bloggers have said that digg can only produce a sharp, short lived pulse of traffic (or a sting), as opposed to a long tail or a double-dipping long tail, as in my case, but those observations are for posts that are not themselves memes. When you have a meme you get the long tail (or an exponential decay) and when you chain memes as I did (which I guess I could have done faster as the second wave would have been much bigger) then you get a double-dipping long tail as I’m having now.

    Today (which is 7 days after the original experiment) the traffic is over 800 hits so far, still on the strength of the original wave and the second wave (note that the flat like I had before the spike represents levels of traffic between ~100 to ~800, so don’t be fooled by the flatness, it’s relative to the scale of the graph.)

    In other words, traffic is still going strong from the strength of the long-tail waves generated from the original post and the follow up one.



    1. Wikipedia 3.0: The End of Google?
    2. For Great Justice, Take Off Every Digg
    3. Unwisdom of Crowds
    4. Self-Aware e-Society

    Posted by Marc Fawzi

    Semantic Web, Web strandards, Trends, wisdom of crowds, tagging, Startup, mass psychology, Google, cult psychology, inference, inference engine, AI, ontology, Semanticweb, Web 2.0, Web 2.0, Web 3.0, Web 3.0, Google Base, artificial intelligence, AI, Wikipedia, Wikipedia 3.0, collective consciousness, digg, censorship


    1. Update this in two weeks, after a Friday, Saturday, and Sunday, and a holiday in the middle of the week in the United States which means a lot of people are on vacation, and another weekend, and see what happens with traffic trends, including Digg related traffic. And check out my unscientific reseach on when the best time and day to post is on your blog, and compare what you find over the course of time, not just a couple days. I’m curious how days of the week and the informal research I did might reflect within your information. That will REALLY help us see the reality of your success.Still, you’ve gathered a ton of fabulous information. I found it interesting that the post title on your Digg sucks article kept changing every hour or so on the WordPress.com top lists. I think it was “Power of the Schwartz” that really caught my eye. 😉

      I wish you could check out how much traffic came from WordPress.com dashboards and top blog listing comparatively to Digg traffic results, as well as all the other social bookmarking sources which pick up Digg posts, and compare that information as to how directly your traffic was related solely to Digg. It was in the first place, but “then” what happened.

      There is a lot of whack things that go into driving traffic, and I also know that WordPress.com’s built in traffic charts don’t match up exactly and consistently with some of the external traffic reports I’ve checked for my WordPress.com blog, so only time will tell, and this will get more and more interesting as time goes on.

      Good work!

      Comment by Lorelle VanFossen — July 2, 2006 @ 11:19 am

    2. Yeah I caught myself saying “Merchandising Merchandising Merchandising” the other day!:)

      Well I noticed about 1000, 800, 600, 500 hits (in this order) from WordPress for those 4 days …

      Valleywag sent me about 12,000 (in total)


      Comment by evolvingtrends — July 2, 2006 @ 11:26 am

    3. Great analysis on digg. It looks like digg or the memes can be somewhat influenced and analyzed. It’s almost like psycho analyzing a strange new brain.I find it very interesting how this all happened. Even if digg gave you a short pulse for a few days, it generated augmented daily traffic until now. I wouldn’t be surprised that new readers discovered you this way. The whole applications of traffic and readers are very fluid in nature. I wonder if they could be mapped in some way of form through fluid dynamics.


      Comment by range — July 3, 2006 @ 1:39 am

    4. It’s highly multi-disciplinary. It can be conquered but not as fast as you or I would like.This is like analyzing a strange new brain … a brain that is influenced greatly by everything except logic.

      I plan on analyzing it in the open for a long time to come, so stick around and add your thoughts to it. 🙂
      They say ‘observing something changes its outcome’ .. So we’ll see how it goes.



      Comment by evolvingtrends — July 3, 2006 @ 2:36 am

    5. […] 1. Digg This! 55,500 Hits in ~4 Days […]Pingback by Evolving Trends » Global Brain vs Google — July 3, 2006 @ 10:37 am
    6. […] This article has a follow-up part: Digg This! 55,500 Hits in ~4 Days […]Pingback by Evolving Trends » For Great Justice, Take Off Every Digg — July 3, 2006 @ 10:57 am
    7. Marc,I don’t know if this information helps or skews your research, but a post I wrote in January, titled to get Digg and other traffic attention, Horse Sex and What is Dictating Your Blog’s Content, did not do well at all. That is until the past three days.

      It’s really started piling up a lot of hits, sitting in the top 10 of my top posts, outreaching the other posts that get consistently high traffic by a huge margin. Until Saturday, that post was not even in the top 50 or 75. I can’t tell where the traffic is suddenly coming from, as WordPress.com doesn’t offer that kind of specific information, and I’m not getting any outstanding traffic from any single source. Nothing from Digg, but something is suddenly driving that post through the roof. Even during a holiday week in the US! Very strange.

      Maybe there’s a new fad in horse sex lately – who knows? 😉

      Still, the point is that this was written in January, and now it is getting attention in July. I’ll be checking to find out what is causing the sudden flush of traffic, but never doubt that your posts are ageless in many respects. So the long term study of Digg patterns and traffic will help all of us over the “long haul”. That’s why I’m really curious about the long term effects of your posts.

      Sometimes you just can’t predict the crowds. 😉 Or what they will suddenly be interested in. I’ve written so many posts and titles that I was sure would skyrocket traffic, only to lay there like empty beer bottles in the playground. Totally useless. And others with sloppy titles and written quickly with little attention to detail skyrocketing like 1000 bottles of coke filled with Mentos. 😉 It’s an interesting process, isn’t it?

      Comment by Lorelle VanFossen — July 3, 2006 @ 9:37 pm

    8. Predicting the weather for the long term is not currently feasible. However, predicting the weather for the short term is (1-2 days in davance.)But it’s not all about ‘predicting’ … It’s about studying the phenomenon so that we can make better choices to reduce the effect of uncertainty and not try to eliminate uncertainty.


      Comment by evolvingtrends — July 4, 2006 @ 12:02 am

    9. I think then that the obvious question is why you’ve done nothing to monetize those hits, however fickle they might be!;)

      Comment by Sam Jackson — July 4, 2006 @ 4:42 pm

    10. Monetize, Monetize, Monetize!Fortunately, that won’t happen 🙂


      Comment by evolvingtrends — July 4, 2006 @ 8:28 pm

    11. […] 4 – Digg This! 55,500 hits in ~4 Days A blogger explains how he ‘milked’ Digg for a major spike in traffic. Meme engineering in action; fascinating stuff. (tags: Wikipedia Google visits article post tail long spike scam traffic blogging blog meme Digg) […]Pingback by Velcro City Tourist Board » Blog Archive » Links for 05-07-2006 — July 4, 2006 @ 10:20 pm
    12. Since web traffic is dictated by humans and engines and not by some exterior force like the weather, I think that there are a lot of possible venues of analysis of it. The only thing is that the flow and traffic needs to be documented. In most cases, the traffic might be, but there lacks information on past flow. The internet is concentrated on the now and less with what happened ten days ago on this site and such.Mathematical Fluid dynamics are probably the way to go, though even if I am a mathematician, I’d have to research it a bit before pronouncing myself completely. These types of analysis can get quite complicated because of the implications of partial differential equations of an order higher than 2, which can not be solved only approximated numerically.

      I’m sure I’m not the only one to say this, but I like the types of discussions and content that you put forward, it gets the mind thinking on certain subjects that most of the time users tend to accept without question.

      Comment by range — July 4, 2006 @ 10:54 pm

    13. “the implications of partial differential equations of an order higher than 2, which can not be solved only approximated numerically.”Have you looked into Meyer’s methods of “invariant embedding” …? to convert PDEs to a set of ordinary differential equations then solve?

      I believe the investigation of hype management is extremely multi-disciplinary and very much like the weather. That means that while it’s deterministic (as everything is in essence with the exception of non-causal quantum theory) it’s still highly unstable and ultimately hard [in computationl terms] to predict.

      In general, uncertainty exists in every system, including maths itself (because of lack of absolute consistency and incompleteness), so while you can’t eliminate it you can hope to reduce it.

      But in practical terms, what I’m looking to do is to simply gain a sufficient minimum in insight to allow me to improve my chances at generating and surfing hype waves… I believe I will end up applying a non-formal theory such as framing theory to transform the problem from the computational domain to the cognitive domain (so I may use that 90% of the brain that we supposedly don’t use to carry out the computation with my own internal computational model.)

      Clarity, in simple terms, is what it’s all about.

      However, to reach clarity’s peak you have to climb a mountain of complexity 🙂


      Comment by evolvingtrends — July 4, 2006 @ 11:10 pm

    14. Hey Marc!I now know what it feels like to be caught in a digg like wave. Right now, I have had over 141000 page views because of a post that I did this morning, explaining HDR photography.

      Since digg banned my url for some reason (I don’t know why, I haven’t posted anything to digg in the last 2 months), this was all done through del.icio.us, Reddit and Popurls. It’s like one thing leads to another. I have added an url giving a surface analysis of this situation.


      Naturally, I find myself compelled to continue writing on the subject. I have already posted a follow-up article and I am working on another one right now. I knew I had a spike on weekends, nothing like this however.

      Comment by range — July 15, 2006 @ 7:29 pm

    15. Hey Marc.I think the main reason why I didn’t get any higher was because of the stat problem that WP has been having over the last few days.

      I hope they save this traffic so that I have some nice graphs to show you. They probably do. It felt like the counter was accurate, I checked out that I did indeed make onto a few memediggers, still am right now.

      And also the stat page was just so slow to catch up with the amount of traffic that was generated. WP couldn’t keep up.

      Hopefully, they will sort it out over the next few days. I think it was most surprising in the afternoon. I kept refreshing the counter, and oups, a few thousand here, ten thousand there. I was really surprised. And I have also started getting some haters, as you must know, with the good comes the bad.

      Comment by range — July 15, 2006 @ 8:49 pm

    Read Full Post »

    Evolving Trends

    July 10, 2006

    Is Google a Monopoly?

    (this post was last updated on Jan 11, ‘07)Given the growing feeling that Google holds too much power over our future without any proof that they can handle such responsibility wisely, and with plenty of proof to the opposite1, it is clear why people find themselves breathing a sigh of relief at the prospect of a new Web order, where Google will not be as powerful and dominant.

    In the software industry, economies of scale do not derive from production capacity but rather from the size of the installed user base, as software is made of electrical pulses that can be downloaded by the users, at a relatively small cost to the producer (or virtually no cost if using the P2P model of the Web.) This means that the size of the installed user base replaces production capacity in classical economic terms.

    Just as Microsoft used its economies of scale (i.e. its installed user base) as part of a copy-and-co-opt strategy to dominate the desktop, Google has shifted from a strategy of genuine innovation, which is expensive and risky, to a lower-risk copy-and-co-opt strategy in which it uses its economies of scale (i.e. its installed user base) to eliminate competition and dominate the Web.

    The combination of the ability to copy and co-opt innovations across broad segments of the market together with existing and growing economies of scale is what makes Google a monopoly.

    Consider the following example: DabbleDB (among other companies) beat Google to market with their online, collaborative spreadsheet application, but Google acquired their competitor and produced a similar (yet inferior) product that is now threatening to kill DabbleDB’s chances for growth.

    One way to think of what’s happening is in terms of the first law of thermodynamics (aka conservation of energy): if Google grows then many smaller companies will die. And as Google grows, many smaller companies are dying.

    It is not any better or worse than it used to be under the Microsoft monopoly for companies that have to compete with Google . But it’s much worse for us the people because what is at stake now is much bigger. It’s no longer about our PCs and LANs, it’s about the future of the entire Web.

    You could argue that the patent system protects smaller companies from having their products and innovations copied and co-opted by bigger competitors like Google. However, during the Microsoft dominated era, very few companies succeeded in suing them for patent infringement. I happen to know of one former PC software company and their ex CEO who succeeded in suing Microsoft for $120M. But that’s a rare exception to a common rule: the one with the deeper pockets always has the advantage in court (they can drag the lawsuit for years and make it too costly for others to sue them.)

    Therefore, given that Google is perceived as a growing monopoly that many see as having acquired too much power, too fast, without the wisdom to use that power responsibly, I’m not too surprised that many people have welcomed the Wikipedia 3.0 vision.

    1. What leaps to mind as far as Google’s lack of wisdom is how they had sold the world on their “Do No Evil” mantra only to “Do Evil” when it came to oppressing the already-oppressed (see: Google Chinese censorship.)


    1. Wikipedia 3.0: The End of Google?
    2. P2P 3.0: The People’s Google
    3. Google 2.71828: Analysis of Google’s Web 2.0 Strategy

    Posted by Marc Fawzi


    Web 2.0, Google, Adam Smith, Monopoly, Trends, imperialism, Anti-Americanism, economies of scale, innovation, Startup, Google Writely, Google spreadsheets, DabbleDB, Google Base, Web 3.0


    1. Google is large and influential. That doesn’t make it a monopoly.They have 39% market share in Search in the US – http://searchenginewatch.com/reports/article.php/3099931 – a lot more than their closest competitor, but it’s wrong to describe them as a monopoly. A monopoly has a legal entitlement to be the only provider of a product or service. More loosely, it can be used to describe a company with such dominance in the market that it makes no sense to try to compete with them. Neither apply to Google. I think your correspondents are simply reacting against the biggest player because they are the biggest, the same way people knock Microsoft, Symantec, Adobe, etc.

      Certainly, Yahoo!, MSN and Ask Jeeves, etc. aren’t ready to throw in the towel yet. Arguably, if they were struggling, and I don’t know if they are, DabbleDB would need to differentiate a little more against Google to make their model work as a business. I am not sure they need to.

      One last point, there isn’t a finite number of people looking for spreadsheets, etc., online. It’s a growing market with enormous untapped potential. The winners will be those best able to overcome the serious objections people have towards online apps – security & stability. Spreadsheets and databases are business apps – it will not be good enough to throw up something that is marked beta and sometimes works and might be secure. I think people dealing with business data *want* to pay for such products, because it guarantees them levels of service and the likelihood that the company will still be around in a year.

      Comment by Ian — July 11, 2006 @ 4:52 am

    2. I don’t think any company can compete against Google, especially not small companies. If MS and Interactive Corp. are having to struggle against Google then how can any small company compete against them? They have economies of scale that cannot be undone so easily, except through P2P subversion of the central search model (See my Web 3.0 article), which is going to happen on its own (I don’t need to advocate it.)Having said that I did specify ways to compete with Google in SaaS in the post titled Google 2.71828: Analysis of Google’s Web 2.0 Strategy

      But in gerenal, it’s getting tough out there because of Google’s economies of scale and their ability/willingess to copy-and-co-opt innovations across a broad segment of the market.

      Ian wrote:

      “A monopoly has a legal entitlement to be the only provider of a product or service.”

      The definition of Monopoly in the US does not equate to state run companies or any such concept from the EU domain. It simply equates to economies of scale and ability to copy and co-opt innovations in a broad sector of the market. Monopolies that exist in market niches are a natural result of free markets but ones that exist in broad segments are problematic to free markets.


      Comment by evolvingtrends — July 11, 2006 @ 5:21 am

    3. Don’t forget that Google prevents AdSense publishers from using other context-based advertising services on the same pages that have AdSense ads.Comment by drew — July 12, 2006 @ 12:23 am
    4. That’s a sure sign that they’re a monopoly. Just like MS used to force PC makers to do the same.Marc

      Comment by evolvingtrends — July 12, 2006 @ 5:05 am

    5. […] impact it has over a worldwide, super-connected tool like the Internet. An article by Marc Fawzi on Evolving Trends expressed this effectively […]Pingback by What Evil Lurks in the Heart of Google | Phil Butler Unplugged — November 6, 2007 @ 11:46 pm
    6. […] with existing and growing economies of scale is what makes Google a monopoly,” states Evolving Trends. As Google grows, many smaller companies will die. In order to set up its monopoly, Google is used […]Pingback by Google: pro’s and con’s « E-culture & communication: open your mind — November 14, 2007 @ 6:42 am
    7. Contrary to what the Google fan club and the Google propoganda machine would have you believe, Here are some real facts:- People do have a choice with operating systems. They can buy a MAC or use Linux.

      – Google has a terrible tack record of abusing its power:
      – click fraud lawsuit where they used a grubby lawyer and tricks to pay almost nothing.
      – they pass on a very small share to adsense publishers and make them sign a confidentiality agreement.
      – They tried to prevent publishers from showing other ads.

      – Google Adsense is responsible for the majority of spam on the Internet.

      – Google has a PR machine which includes Matt Cutts and others, who suppress criticism and even make personal attacks on people who are critical of them. They are also constantly releasing a barrage press releases with gimmicks to improve their image with the public.

      Wake up people. Excessive power leads to abuse.

      Comment by Pete — January 26, 2008 @ 1:25 pm

    8. I think this is a really important discussion which has been started here, thank you Marc.
      I got suspicious today when I heard about MS wanting to take over Yahoo! – or will it be Google…Anyway, this is really crucial stuff here, it’s the much praised freedom of the information age and hence the real hope for a truly open world which is at stake here, I hope there’s some degree of acknowledgment on this.

      So to feed the discussion more,

      – what can we do as users?
      – are there alternative independent search engines out there?
      – should we think of starting new strategies in information retrieval?
      – what ideas are around?

      Is there a good active community somewhere discussing these issues? would be interested in participating…

      thank you

      Comment by Fabio — February 4, 2008 @ 11:42 am

    Read Full Post »

    Evolving Trends

    July 11, 2006

    P2P 3.0: The People’s Google


    This is a more extensive version of the Web 3.0 article with extra sections about the implications of Web 3.0 to Google.

    See this follow up article  for the more disruptive ‘decentralized kowledgebase’ version of the model discussed in article.

    Also see this non-Web3.0 version: P2P to Destroy Google, Yahoo, eBay et al 

    Web 3.0 Developers:

    Feb 5, ‘07: The following reference should provide some context regarding the use of rule-based inference engines and ontologies in implementing the Semantic Web + AI vision (aka Web 3.0) but there are better, simpler ways of doing it. 

    1. Description Logic Programs: Combining Logic Programs with Description Logic


    In Web 3.0 (aka Semantic Web) P2P Inference Engines running on millions of users’ PCs and working with standardized domain-specific ontologies (created by Wikipedia, Ontoworld, other organizations or individuals) using Semantic Web tools, including Semantic MediaWiki, will produce an infomration infrastructure far more powerful than Google (or any current search engine.)

    The availability of standardized ontologies that are being created by people, organizations, swarms, smart mobs, e-societies, etc, and the near-future availability of P2P Semantic Web Inference Engines that work with those ontologies means that we will be able to build an intelligent, decentralized, “P2P” version of Google.

    Thus, the emergence of P2P Inference Engines and domain-specific ontologies in Web 3.0 (aka Semantic Web) will present a major threat to the central “search” engine model.

    Basic Web 3.0 Concepts

    Knowledge domains

    A knowledge domain is something like Physics, Chemistry, Biology, Politics, the Web, Sociology, Psychology, History, etc. There can be many sub-domains under each domain each having their own sub-domains and so on.

    Information vs Knowledge

    To a machine, knowledge is comprehended information (aka new information produced through the application of deductive reasoning to exiting information). To a machine, information is only data, until it is processed and comprehended.


    For each domain of human knowledge, an ontology must be constructed, partly by hand [or rather by brain] and partly with the aid of automation tools.

    Ontologies are not knowledge nor are they information. They are meta-information. In other words, ontologies are information about information. In the context of the Semantic Web, they encode, using an ontology language, the relationships between the various terms within the information. Those relationships, which may be thought of as the axioms (basic assumptions), together with the rules governing the inference process, both enable as well as constrain the interpretation (and well-formed use) of those terms by the Info Agents to reason new conclusions based on existing information, i.e. to think. In other words, theorems (formal deductive propositions that are provable based on the axioms and the rules of inference) may be generated by the software, thus allowing formal deductive reasoning at the machine level. And given that an ontology, as described here, is a statement of Logic Theory, two or more independent Info Agents processing the same domain-specific ontology will be able to collaborate and deduce an answer to a query, without being driven by the same software.

    Inference Engines

    In the context of Web 3.0, Inference engines will be combining the latest innovations from the artificial intelligence (AI) field together with domain-specific ontologies (created as formal or informal ontologies by, say, Wikipedia, as well as others), domain inference rules, and query structures to enable deductive reasoning on the machine level.

    Info Agents

    Info Agents are instances of an Inference Engine, each working with a domain-specific ontology. Two or more agents working with a shared ontology may collaborate to deduce answers to questions. Such collaborating agents may be based on differently designed Inference Engines and they would still be able to collaborate.

    Proofs and Answers

    The interesting thing about Info Agents that I did not clarify in the original post is that they will be capable of not only deducing answers from existing information (i.e. generating new information [and gaining knowledge in the process, for those agents with a learning function]) but they will also be able to formally test propositions (represented in some query logic) that are made directly or implied by the user. For example, instead of the example I gave previously (in the Wikipedia 3.0 article) where the user asks “Where is the nearest restaurant that serves Italian cuisine” and the machine deduces that a pizza restaurant serves Italian cuisine, the user may ask “Is the moon blue?” or say that the “moon is blue” to get a true or false answer from the machine. In this case, a simple Info Agent may answer with “No” but a more sophisticated one may say “the moon is not blue but some humans are fond of saying ‘once in a blue moon’ which seems illogical to me.”

    This test-of-truth feature assumes the use of an ontology language (as a formal logic system) and an ontology where all propositions (or formal statements) that can be made can be computed (i.e. proved true or false) and were all such computations are decidable in finite time. The language may be OWL-DL or any language that, together with the ontology in question, satisfy the completeness and decidability conditions.

    P2P 3.0 vs Google

    If you think of how many processes currently run on all the computers and devices connected to the Internet then that should give you an idea of how many Info Agents can be running at once (as of today), all reasoning collaboratively across the different domains of human knowledge, processing and reasoning about heaps of information, deducing answers and deciding truthfulness or falsehood of user-stated or system-generated propositions.

    Web 3.0 will bring with it a shift from centralized search engines to P2P Semantic Web Inference Engines, which will collectively have vastly more deductive power, in both quality and quantity, than Google can ever have (included in this exclusion is any future AI-enabled version of Google, as it will not be able to keep up with the distributed P2P AI matrix that will be enabled by millions of users running free P2P Semantic Web Inference Engine software on their home PCs.)

    Thus, P2P Semantic Web Inference Engines will pose a huge and escalating threat to Google and other search engines and will expectedly do to them what P2P file sharing and BitTorrent did to FTP (central-server file transfer) and centralized file hosting in general (e.g. Amazon’s S3 use of BitTorrent.)

    In other words, the coming of P2P Semantic Web Inference Engines, as an integral part of the still-emerging Web 3.0, will threaten to wipe out Google and other existing search engines. It’s hard to imagine how any one company could compete with 2 billion Web users (and counting), all of whom are potential users of the disruptive P2P model described here.

    “The Future Has Arrived But It’s Not Evenly Distributed”

    Currently, Semantic Web (aka Web 3.0) researchers are working out the technology and human resource issues and folks like Tim Berners-Lee, the Noble prize recipient and father of the Web, are battling critics and enlightening minds about the coming human-machine revolution.

    The Semantic Web (aka Web 3.0) has already arrived, and Inference Engines are working with prototypical ontologies, but this effort is a massive one, which is why I was suggesting that its most likely enabler will be a social, collaborative movement such as Wikipedia, which has the human resources (in the form of the thousands of knowledgeable volunteers) to help create the ontologies (most likely as informal ontologies based on semantic annotations) that, when combined with inference rules for each domain of knowledge and the query structures for the particular schema, enable deductive reasoning at the machine level.


    On AI and Natural Language Processing

    I believe that the first generation of AI that will be used by Web 3.0 (aka Semantic Web) will be based on relatively simple inference engines (employing both algorithmic and heuristic approaches) that will not attempt to perform natural language processing. However, they will still have the formal deductive reasoning capabilities described earlier in this article.


    1. Wikipedia 3.0: The End of Google?
    2. Intelligence (Not Content) is King in Web 3.0
    3. Get Your DBin
    4. All About Web 3.0

    Posted by Marc Fawzi

    Enjoyed this analysis? You may share it with others on:

    digg.png newsvine.png nowpublic.jpg reddit.png blinkbits.png co.mments.gif stumbleupon.png webride.gif del.icio.us

    Read Full Post »

    Older Posts »

    %d bloggers like this: