Archive for the ‘Wikipedia’ Category

Fightin’ Words

Murdoch Wants A Google Rebellion

Dirk Smillie, 04.03.09, 05:40 PM EDT

The media mogul says Google is stealing from publishers. It could be the call to arms that newsrooms need.

Rupert Murdoch threw down the gauntlet to Google Thursday, accusing the search giant of poaching content it doesn’t own and urging media outlets to fight back. “Should we be allowing Google to steal all our copyrights?” asked the News Corp. chief at a cable industry confab in Washington, D.C., Thursday. The answer, said Murdoch, should be, ” ‘Thanks, but no thanks.’ “

Google ( GOOGnews people ) sees it differently. They send more than 300 million clicks a month to newspaper Web sites, says a Google spokesperson. The search giant is in “full compliance” with copyright laws. “We show just enough information to make the user want to read a full story–the headlines, a line or two of text and links to the story’s Web site. That’s it.” For most links, if a reader wants to peruse an entire article, they have to click through to the newspaper’s Web site.


Maybe so. But Murdoch’s anger is understandable. Like the music industry, newspapers have watched new distribution channels change the economics of their business. Sites like Google, which don’t produce any journalism of their own, have made themselves into destinations for readers by successfully organizing the work of others and selling advertising against it. Meanwhile, the authors whither. In a recent interview with Charlie Rose, Wall Street Journal Managing Editor Robert Thomson drew a bead on Murdoch’s beef: “Google devalues everything it touches,” he said. “It divides content quantitatively rather than qualitatively.”

Yet the relationship is more complex than that. Sites like WSJ.com rely on Google to send them readers, working hard to game how they appear on Google through the dark arts of search engine optimization. Newspapers use Google in other ways too. Users streaming to The Los Angeles Times Web site last year followed the path of Southern California wildfires using Google maps at the site. The maps were displayed alongside links to updated stories about fires.

“Google is not at fault,” says Gregory Rutchik, chairman of Beverly Hills-based The Arts and Technology Law Group. Rutchik says Murdoch’s comments may be a “first shot across Google’s bow.” Says Rutchik: “Murdoch wants to be paid for his newspaper assets. His statements may be a precursor to a lawsuit that would bring Google to the bargaining table to figure out just how to do that.”

Still, the episode is reminiscent of how publishers started talking about the Associated Press at this time last year, a squall that later grew into a storm (see “Down On The Wire”). Since then, a growing number of papers decided to do the once unthinkable: They’ve scaled back their AP subscriptions or ditched them altogether. The print death of major newspapers like the Rocky Mountain News, Christian Science Monitor and Seattle Post-Intelligencer may further radicalize cornered editors and publishers struggling to save their newsrooms.

For now, newspapers’ attempts at gaming Google remain “rogue efforts,” says Anthony Moor, deputy managing editor of the Dallas Morning News Online and a director of the Online News Association. “I wish newspapers could act together to negotiate better terms with companies like Google. Better yet, what would happen if we all turned our sites off to search engines for a week? By creating scarcity, we might finally get fair value for the work we do.” Sounds like an idea Murdoch would endorse.

Read Full Post »

Evolving Trends

The People’s Google

In Uncategorized on July 11, 2006 at 10:16 am

Author: Marc Fawzi

License: Attribution-NonCommercial-ShareAlike 3.0


This is a follow-up to the Wikipedia 3.0 article.

See this article for a more disruptive ‘decentralized kowledgebase’ version of the model discussed here.

Also see this non-Web3.0 version: P2P to Destroy Google, Yahoo, eBay et al

Web 3.0 Developers:

Feb 5, ‘07: The following reference should provide some context regarding the use of rule-based inference engines and ontologies in implementing the Semantic Web + AI vision (aka Web 3.0) but there are better, simpler ways of doing it.

  1. Description Logic Programs: Combining Logic Programs with Description Logic


In Web 3.0 (aka Semantic Web), P2P Inference Engines running on millions of users’ PCs and working with standardized domain-specific ontologies (that may be created by entities like Wikipedia and other organizations) using Semantic Web tools will produce an information infrastructure far more powerful than the current infrastructure that Google uses (or any Web 1.0/2.0 search engine for that matter.)

Having the sandardized ontologies and the P2P Semantic Web Inference Engines that work with those ontologies will lead to a more intelligent, “Massively P2P” version of Google.

Therefore, the emergence in Web 3.0 of said P2P Inference Engines combined with standardized domain-specific ontologies will present a major threat to the central “search” engine model.

Basic Web 3.0 Concepts

Knowledge domains

A knowledge domain is something like Physics, Chemistry, Biology, Politics, the Web, Sociology, Psychology, History, etc. There can be many sub-domains under each domain each having their own sub-domains and so on.

Information vs Knowledge

To a machine, knowledge is comprehended information (aka new information that is produced via the application of deductive reasoning to exiting information). To a machine, information is only data, until it is reasoned about.


For each domain of human knowledge, an ontology must be constructed, partly by hand and partly with the aid of dialog-driven ontology construction tools.

Ontologies are not knowledge nor are they information. They are meta-information. In other words, ontologies are information about information. In the context of the Semantic Web, they encode, using an ontology language, the relationships between the various terms within the information. Those relationships, which may be thought of as the axioms (basic assumptions), together with the rules governing the inference process, both enable as well as constrain the interpretation (and well-formed use) of those terms by the Info Agents to reason new conclusions based on existing information, i.e. to think. In other words, theorems (formal deductive propositions that are provable based on the axioms and the rules of inference) may be generated by the software, thus allowing formal deductive reasoning at the machine level. And given that an ontology, as described here, is a statement of Logic Theory, two or more independent Info Agents processing the same domain-specific ontology will be able to collaborate and deduce an answer to a query, without being driven by the same software.

Inference Engines

In the context of Web 3.0, Inference engines will be combining the latest innovations from the artificial intelligence (AI) field together with domain-specific ontologies (created as formal or informal ontologies by, say, Wikipedia, as well as others), domain inference rules, and query structures to enable deductive reasoning on the machine level.

Info Agents

Info Agents are instances of an Inference Engine, each working with a domain-specific ontology. Two or more agents working with a shared ontology may collaborate to deduce answers to questions. Such collaborating agents may be based on differently designed Inference Engines and they would still be able to collaborate.

Proofs and Answers

The interesting thing about Info Agents that I did not clarify in the original post is that they will be capable of not only deducing answers from existing information (i.e. generating new information [and gaining knowledge in the process, for those agents with a learning function]) but they will also be able to formally test propositions (represented in some query logic) that are made directly -or implied- by the user.

P2P 3.0 vs Google

If you think of how many processes currently run on all the computers and devices connected to the Internet then that should give you an idea of how many Info Agents can be running at once (as of today), all reasoning collaboratively across the different domains of human knowledge, processing and reasoning about heaps of information, deducing answers and deciding truthfulness or falsehood of user-stated or system-generated propositions.

Web 3.0 will bring with it a shift from centralized search engines to P2P Semantic Web Inference Engines, which will collectively have vastly more deductive power, in both quality and quantity, than Google can ever have (included in this assumption is any future AI-enabled version of Google, as it will not be able to keep up with the power of P2P AI matrix that will be enabled by millions of users running free P2P Semantic Web Inference Engine software on their home PCs.)

Thus, P2P Semantic Web Inference Engines will pose a huge and escalating threat to Google and other search engines and will expectedly do to them what P2P file sharing and BitTorrent did to FTP (central-server file transfer) and centralized file hosting in general (e.g. Amazon’s S3 use of BitTorrent.)

In other words, the coming of P2P Semantic Web Inference Engines, as an integral part of the still-emerging Web 3.0, will threaten to wipe out Google and other existing search engines. It’s hard to imagine how any one company could compete with 2 billion Web users (and counting), all of whom are potential users of the disruptive P2P model described here.

The Future

Currently, Semantic Web (aka Web 3.0) researchers are working out the technology and human resource issues and folks like Tim Berners-Lee, the Noble prize recipient and father of the Web, are battling critics and enlightening minds about the coming semantic web revolution.

In fact, the Semantic Web (aka Web 3.0) has already arrived, and Inference Engines are working with prototypical ontologies, but this effort is a massive one, which is why I was suggesting that its most likely enabler will be a social, collaborative movement such as Wikipedia, which has the human resources (in the form of the thousands of knowledgeable volunteers) to help create the ontologies (most likely as informal ontologies based on semantic annotations) that, when combined with inference rules for each domain of knowledge and the query structures for the particular schema, enable deductive reasoning at the machine level.


On AI and Natural Language Processing

I believe that the first generation of AI that will be used by Web 3.0 (aka Semantic Web) will be based on relatively simple inference engines that will NOT attempt to perform natural language processing, where current approaches still face too many serious challenges. However, they will still have the formal deductive reasoning capabilities described earlier in this article, and users would interact with these systems through some query language.


  1. Wikipedia 3.0: The End of Google?
  2. Intelligence (Not Content) is King in Web 3.0
  3. Get Your DBin
  4. All About Web 3.0


Semantic Web, Web strandards, Trends, OWL, Googleinference engine, AI, ontologyWeb 2.0, Web 3.0AI, Wikipedia, Wikipedia 3.0, collective consciousness, Ontoworld, AI Engine, OWL-DL, Semantic MediaWiki, P2P 3.0

Read Full Post »

Evolving Trends

Wikipedia 3.0: The End of Google?

In Uncategorized on June 26, 2006 at 5:18 am

Author: Marc Fawzi

License: Attribution-NonCommercial-ShareAlike 3.0


Semantic Web Developers:

Feb 5, ‘07: The following external reference concerns the use of rule-based inference engines and ontologies in implementing the Semantic Web + AI vision (aka Web 3.0):

  1. Description Logic Programs: Combining Logic Programs with Description Logic (note: there are better, simpler ways of achieving the same purpose.)

Click here for more info and a list of related articles…


Two years after I published this article it has received over 200,000 hits and we now have several startups attempting to apply Semantic Web technology to Wikipedia and knowledge wikis in general, including Wikipedia founder’s own commercial startup as well as a startup that was recently purchased by Microsoft.

Recently, after seeing how Wikipedia’s governance is so flawed, I decided to write about a way to decentralize and democratize Wikipedia.

Versión española


(Article was last updated at 10:15am EST, July 3, 2006)

Wikipedia 3.0: The End of Google?


The Semantic Web (or Web 3.0) promises to “organize the world’s information” in a dramatically more logical way than Google can ever achieve with their current engine design. This is specially true from the point of view of machine comprehension as opposed to human comprehension.The Semantic Web requires the use of a declarative ontological language like OWL to produce domain-specific ontologies that machines can use to reason about information and make new conclusions, not simply match keywords.

However, the Semantic Web, which is still in a development phase where researchers are trying to define the best and most usable design models, would require the participation of thousands of knowledgeable people over time to produce those domain-specific ontologies necessary for its functioning.

Machines (or machine-based reasoning, aka AI software or ‘info agents’) would then be able to use those laboriously –but not entirely manually– constructed ontologies to build a view (or formal model) of how the individual terms within the information relate to each other. Those relationships can be thought of as the axioms (assumed starting truths), which together with the rules governing the inference process both enable as well as constrain the interpretation (and well-formed use) of those terms by the info agents to reason new conclusions based on existing information, i.e. to think. In other words, theorems (formal deductive propositions that are provable based on the axioms and the rules of inference) may be generated by the software, thus allowing formal deductive reasoning at the machine level. And given that an ontology, as described here, is a statement of Logic Theory, two or more independent info agents processing the same domain-specific ontology will be able to collaborate and deduce an answer to a query, without being driven by the same software.

Thus, and as stated, in the Semantic Web individual machine-based agents (or a collaborating group of agents) will be able to understand and use information by translating concepts and deducing new information rather than just matching keywords.

Once machines can understand and use information, using a standard ontology language, the world will never be the same. It will be possible to have an info agent (or many info agents) among your virtual AI-enhanced workforce each having access to different domain specific comprehension space and all communicating with each other to build a collective consciousness.

You’ll be able to ask your info agent or agents to find you the nearest restaurant that serves Italian cuisine, even if the restaurant nearest you advertises itself as a Pizza joint as opposed to an Italian restaurant. But that is just a very simple example of the deductive reasoning machines will be able to perform on information they have.

Far more awesome implications can be seen when you consider that every area of human knowledge will be automatically within the comprehension space of your info agents. That is because each info agent can communicate with other info agents who are specialized in different domains of knowledge to produce a collective consciousness (using the Borg metaphor) that encompasses all human knowledge. The collective “mind” of those agents-as-the-Borg will be the Ultimate Answer Machine, easily displacing Google from this position, which it does not truly fulfill.

The problem with the Semantic Web, besides that researchers are still debating which design and implementation of the ontology language model (and associated technologies) is the best and most usable, is that it would take thousands or tens of thousands of knowledgeable people many years to boil down human knowledge to domain specific ontologies.

However, if we were at some point to take the Wikipedia community and give them the right tools and standards to work with (whether existing or to be developed in the future), which would make it possible for reasonably skilled individuals to help reduce human knowledge to domain-specific ontologies, then that time can be shortened to just a few years, and possibly to as little as two years.

The emergence of a Wikipedia 3.0 (as in Web 3.0, aka Semantic Web) that is built on the Semantic Web model will herald the end of Google as the Ultimate Answer Machine. It will be replaced with “WikiMind” which will not be a mere search engine like Google is but a true Global Brain: a powerful pan-domain inference engine, with a vast set of ontologies (a la Wikipedia 3.0) covering all domains of human knowledge, that can reason and deduce answers instead of just throwing raw information at you using the outdated concept of a search engine.


After writing the original post I found out that a modified version of the Wikipedia application, known as “Semantic” MediaWiki has already been used to implement ontologies. The name that they’ve chosen is Ontoworld. I think WikiMind would have been a cooler name, but I like ontoworld, too, as in “it descended onto the world,” since that may be seen as a reference to the global mind a Semantic-Web-enabled version of Wikipedia could lead to.

Google’s search engine technology, which provides almost all of their revenue, could be made obsolete in the near future. That is unless they have access to Ontoworld or some such pan-domain semantic knowledge repository such that they tap into their ontologies and add inference capability to Google search to build formal deductive intelligence into Google.

But so can Ask.com and MSN and Yahoo…

I would really love to see more competition in this arena, not to see Google or any one company establish a huge lead over others.

The question, to rephrase in Churchillian terms, is wether the combination of the Semantic Web and Wikipedia signals the beginning of the end for Google or the end of the beginning. Obviously, with tens of billions of dollars at stake in investors’ money, I would think that it is the latter. No one wants to see Google fail. There’s too much vested interest. However, I do want to see somebody out maneuver them (which can be done in my opinion.)


Please note that Ontoworld, which currently implements the ontologies, is based on the “Wikipedia” application (also known as MediaWiki), but it is not the same as Wikipedia.org.

Likewise, I expect Wikipedia.org will use their volunteer workforce to reduce the sum of human knowledge that has been entered into their database to domain-specific ontologies for the Semantic Web (aka Web 3.0) Hence, “Wikipedia 3.0.”

Response to Readers’ Comments

The argument I’ve made here is that Wikipedia has the volunteer resources to produce the needed Semantic Web ontologies for the domains of knowledge that it currently covers, while Google does not have those volunteer resources, which will make it reliant on Wikipedia.

Those ontologies together with all the information on the Web, can be accessed by Google and others but Wikipedia will be in charge of the ontologies for the large set of knowledge domains they currently cover, and that is where I see the power shift.

Google and other companies do not have the resources in man power (i.e. the thousands of volunteers Wikipedia has) who would help create those ontologies for the large set of knowledge domains that Wikipedia covers. Wikipedia does, and is positioned to do that better and more effectively than anyone else. Its hard to see how Google would be able create the ontologies for all domains of human knowledge (which are continuously growing in size and number) given how much work that would require. Wikipedia can cover more ground faster with their massive, dedicated force of knowledgeable volunteers.

I believe that the party that will control the creation of the ontologies (i.e. Wikipedia) for the largest number of domains of human knowledge, and not the organization that simply accesses those ontologies (i.e. Google), will have a competitive advantage.

There are many knowledge domains that Wikipedia does not cover. Google will have the edge there but only if people and organizations that produce the information also produce the ontologies on their own, so that Google can access them from its future Semantic Web engine. My belief is that it would happen but very slowly, and that Wikipedia can have the ontologies done for all the domain of knowledge that it currently covers much faster, and then they would have leverage by the fact that they would be in charge of those ontologies (aka the basic layer for AI enablement.)

It still remains unclear, of course, whether the combination of Wikipedia and the Semantic Web herald the beginning of the end for Google or the end of the beginning. As I said in the original part of the post, I believe that it is the latter, and the question I pose in the title of this post, in this context, is not more than rhetorical. However, I could be wrong in my judgment and Google could fall behind Wikipedia as the world’s ultimate answer machine.

After all, Wikipedia makes “us” count. Google doesn’t. Wikipedia derives its power from “us.” Google derives its power from its technology and inflated stock price. Who would you count on to change the world?

Response to Basic Questions Raised by the Readers

Reader divotdave asked a few questions, which I thought to be very basic in nature (i.e. important.) I believe more people will be pondering about the same issues, so I’m to including here them with the replies.

How does it distinguish between good information and bad? How does it determine which parts of the sum of human knowledge to accept and which to reject?

It wouldn’t have to distinguish between good vs bad information (not to be confused with well-formed vs badly formed) if it was to use a reliable source of information (with associated, reliable ontologies.) That is if the information or knowledge to be sought can be derived from Wikipedia 3.0 then it assumes that the information is reliable.

However, with respect to connecting the dots when it comes to returning information or deducing answers from the sea of information that lies beyond Wikipedia then your question becomes very relevant. How would it distinguish good information from bad information so that it can produce good knowledge (aka comprehended information, aka new information produced through deductive reasoning based on exiting information.)

Who, or what as the case may be, will determine what information is irrelevant to me as the inquiring end user?

That is a good question and one which would have to be answered by the researchers working on AI engines for Web 3.0

There will be assumptions made as to what you are inquiring about. Just as when I saw your question I had to make assumption about what you really meant to ask me, AI engines would have to make an assumption, pretty much based on the same cognitive process humans use, which is the topic of a separate post, but which has been covered by many AI researchers.

Is this to say that ultimately some over-arching standard will emerge that all humanity will be forced (by lack of alternative information) to conform to?

There is no need for one standard, except when it comes to the language the ontologies are written in (e.g OWL, OWL-DL, OWL Full etc.) Semantic Web researchers are trying to determine the best and most usable choice, taking into consideration human and machine performance in constructing –and exclusive in the latter case– interpreting those ontologies.

Two or more info agents working with the same domain-specific ontology but having different software (different AI engines) can collaborate with each other.

The only standard required is that of the ontology language and associated production tools.


On AI and Natural Language Processing

I believe that the first generation of AI that will be used by Web 3.0 (aka Semantic Web) will be based on relatively simple inference engines that will NOT attempt to perform natural language processing, where current approaches still face too many serious challenges. However, they will still have the formal deductive reasoning capabilities described earlier in this article, and users would interact with these systems through some query language.

On the Debate about the Nature and Definition of AI

The embedding of AI into cyberspace will be done at first with relatively simple inference engines (that use algorithms and heuristics) that work collaboratively in P2P fashion and use standardized ontologies. The massively parallel interactions between the hundreds of millions of AI Agents that will run within the millions of P2P AI Engines on users’ PCs will give rise to the very complex behavior that is the future global brain.


  1. Web 3.0 Update
  2. All About Web 3.0 <– list of all Web 3.0 articles on this site
  3. P2P 3.0: The People’s Google
  4. Reality as a Service (RaaS): The Case for GWorld <– 3D Web + Semantic Web + AI
  5. For Great Justice, Take Off Every Digg
  6. Google vs Web 3.0
  7. People-Hosted “P2P” Version of Wikipedia
  8. Beyond Google: The Road to a P2P Economy

Update on how the Wikipedia 3.0 vision is spreading:

Update on how Google is co-opting the Wikipedia 3.0 vision:

Web 3D Fans:

Here is the original Web 3D + Semantic Web + AI article:

Web 3D + Semantic Web + AI *

The above mentioned Web 3D + Semantic Web + AI vision which preceded the Wikipedia 3.0 vision received much less attention because it was not presented in a controversial manner. This fact was noted as the biggest flaw of social bookmarking site digg which was used to promote this article.

Web 3.0 Developers:

Feb 5, ‘07: The following external reference concerns the use of rule-based inference engines and ontologies in implementing the Semantic Web + AI vision (aka Web 3.0):

  1. Description Logic Programs: Combining Logic Programs with Description Logic (note: there are better, simpler ways of achieving the same purpose.)

Jan 7, ‘07: The following Evolving Trends post discusses the current state of semantic search engines and ways to improve the paradigm:

  1. Designing a Better Web 3.0 Search Engine

June 27, ‘06: Semantic MediaWiki project, enabling the insertion of semantic annotations (or metadata) into the content:

  1. http://semantic-mediawiki.org/wiki/Semantic_MediaWiki (see note on Wikia below)

Wikipedia’s Founder and Web 3.0


Read Full Post »

Evolving Trends

Google Warming Up to the Wikipedia 3.0 vision?

In Uncategorized on December 14, 2007 at 8:09 pm

[source: slashdot.org]

Google’s “Knol” Reinvents Wikipedia

Posted by CmdrTaco on Friday December 14, @08:31AM
from the only-a-matter-of-time dept.


teslatug writes “Google appears to be reinventing Wikipedia with their new product that they call knol (not yet publicly available). In an attempt to gather human knowledge, Google will accept articles from users who will be credited with the article by name. If they want, they can allow ads to appear alongside the content and they will be getting a share of the profits if that’s the case. Other users will be allowed to rate, edit or comment on the articles. The content does not have to be exclusive to Google but no mention is made on any license for it. Is this a better model for free information gathering?”

This article Wikipedia 3.0: The End of Google?  which gives you an idea why Google would want its own Wikipedia was on the Google Finance page for at least 3 months when anyone looked up the Google stock symbol, so Google employees, investors and executive must have seen it. 

Is it a coincidence that Google is building its own Wikipedia now?

The only problem is a flaw in Google’s thinking. People who author those articles on Wikipedia actually have brains. People with brains tend to have principles. Getting paid pennies to build the Google empire is rarely one of those principles.


Read Full Post »

Tech Biz  :  IT   

Giving Search a Human Touch

Michael Calore  12.29.06

The idea of building a better search engine sounds almost laughable on the surface.

After all, isn’t there already a massively successful internet search player with a seemingly insurmountable market share? But to hear Jimmy Wales, co-founder of Wikipedia and chairman of the for-profit wiki site Wikia, describe his vision of a totally transparent social search engine — one built with open-source software and inspired by the collaborative spirit of wikis — you realize that his plan just might work.

Wales’ plan for the Search Wikia project is to put ordinary users in charge of ranking search results. Heavy lifting such as indexing and raw ranking will still be done by machines, but the more nuanced work of deciding how search results are displayed will be completed by humans.

Google, the current King of Search, ranks search results based on the perceived trust of the web community at large — the more links a page receives, the more it’s trusted as an authoritative source of information, and the higher the rank. However, this method is open to tinkering, trickery and hacks, all of which damage the relevancy of results.

If successful, Wales’ project, which launches in early 2007, will be able to filter out such irrelevant results. Operating much the same way as Wales’ Wikipedia, both the software algorithms powering Search Wikia and the changes applied by the community will be made transparent on the project’s website.

Wired News spoke to Jimmy Wales about Search Wikia. We discussed the ins and outs of how the model will likely work, what it will take to build it, and what sorts of criticisms it will face.

Wired News: Can you describe the new search engine in your own words?

Jimmy Wales: The core of the concept is the open-source nature of everything we’re intending to do — making all of the algorithms public, making all of the data public and trying to achieve the maximum possible transparency. Developers, users, or anyone who wants to can come and see how we’re doing things and give us advice and information about how to make things better.

Additionally, we want to bring in some of the concepts of the wiki model — building a genuine community for discussion and debate to add that human element to the project.

I mention “community” to distinguish us as something different. A lot of times, when people talk about these kinds of (projects), they’re not thinking about communities. They’re thinking about users randomly voting, and that action turning into something larger. I really don’t like the term “crowdsourcing.” We’re really more about getting lots of people engaged in conversations about how things should be done.

WN: How are the communities going to be managed?

Wales: I don’t know! (laughter) If you asked me how the Wikipedia community is managed, I wouldn’t know the answer to that, either. I don’t think it makes sense to manage a community.

It’s about building a space where good people can come in and manage themselves and manage each other. They can have a distinct and clear purpose — a moral purpose — that unites people and brings them together to do something useful.

WN: How will the human-powered ranking element work?

Wales: We don’t know. That’s something that’s really very open-ended at this moment. It’s really up to the community, and I suspect that there won’t be a one-size-fits-all answer. It will depend on the topic and the type of search being conducted.

One of the things that made Wikipedia successful was a really strong avoidance of a priori thinking about exactly “how.” We all have a pretty good intuitive sense of what a good search result is. A variety of different factors make a search result “good,” qualitatively speaking. How we get those kinds of results for the most possible searches depends on a lot of factors.

A lot of the earlier social search projects fell apart because they were committed a priori to some very specific concept of how it should work. When that worked in some cases but not others, they were too stuck in one mold rather than seeing that a variety of approaches depending on the particular topic is really the way to do it.

WN: I’m envisioning that Wikia Search will incorporate some sort of voting system, and that users will be able to adjust and rank lists of results. Is this the case?

Wales: Yes, but how exactly and under what circumstances that would work is really an empirical question that we’ll experiment with. At Wikipedia and in the wiki world, one of the things we’ve always pushed hard against is voting. Voting is usually not the best way to get a correct answer by consensus. Voting can be gamed, it can be played with. It’s a crutch of a tool that you can use when you don’t have anything better to use. Sometimes, there is no better way. You have to say, “We’ve tried to get a consensus and we couldn’t, so we took a vote.”

In general, envisioning some sort of pre-built algorithm for counting people’s votes is just not a good idea.

WN: Speaking of gaming, what methodologies do you think Search Wikia will employ to fight gaming?

Wales: I think the most important thing to use to fight against gaming is genuine human community. Those kinds of gaming behaviors pop up when there is an algorithm that works in some mechanical way, and then people find a way to exploit it. It’s pretty hard to do that within a community of people who know each other. Basically, if you’re being a jerk, they’ll tell you knock it off and you’ll be blocked from the site. It’s pretty simple for humans to see through that sort of thing. The real way to fight it is to have a group of people who trust each other, with that trust having been built over a period of time.

WN: Will there be some sort of validation that happens when results are ranked by users? Will knowledgeable contributors get the chance to vet changes?

Wales: Yes. The keys of good design here have to do with transparency — everybody can see what everyone else has done. The communities will have the ability to effect and modify changes as they see fit.

WN: What forms of open-source software are you applying to this search project, and why do you think those would be more successful than proprietary search software?

Wales: Here’s the main thing. If we publish all the software — and we’ll be starting with Lucene and Nutch, which are these open source projects that are out there and already quite good — and do all of our modifications transparently in public, then other programmers can come and check the code. If you see things that aren’t working well, you can contribute. People who are coders can contribute in one way, and ordinary people using the site can also contribute in other ways.

It’s mostly about the trust that you get from that transparency. You can see for yourself, if you choose to dig into it, how things are ranked and why certain results are ranked the way they are. You can also choose to download the whole thing and do tests or tweak it to make it better in certain areas. That kind of transparency helps if you see a problem with search in some area that you care about, like some technical field for example. There’s no good way for you to go and tell Google that their search is broken in this area, or that they need to disambiguate these terms — or whatever.

By having an up-front commitment to transparency, I think you can do that.

WN: One of the key arguments in favor of a new search model is that traditional search engines like Google are subjected to spam more and more often. How can a wiki-powered search engine better fight search spam?

Wales: Again, I think it’s that human element. Humans can recognize that a domain is not returning good results, and if you have a good community of people to discuss it, you can just kick them out of the search engine. It seems pretty simple to me — it’s an editorial judgment. You just have to have a broad base of people who can do that.

WN: How are you going to build this broad base? Will there be an outreach, or are you expecting people to just come to you?

Wales: I think people will come. If we’re doing interesting work and people find it fun, then people will come.

WN: When do you expect to see Search Wikia up and running?

Wales: The project to build the community to build the search engine is launching in the first quarter of 2007, not the search engine itself. We may have something up pretty quickly, maybe some sort of demo or test for people to start playing with. But we don’t want to build up expectations that people can come in three months and check out this Google-killing search engine that we’ve written from scratch. It’s not going to happen that fast.

What we want to do now is get the community going and get the transparent algorithms going so we can start the real work. It’s going to be a couple of years before this really turns into something interesting.

Read Full Post »

Tech Biz  :  IT   

Murdoch Calls Google, Yahoo Copyright Thieves — Is He Right?

By David Kravets EmailApril 03, 2009 | 5:00:18 PMCategories: Intellectual Property  

Murdoch_2 Rupert Murdoch, the owner of News Corp. and The Wall Street Journal, says Google and Yahoo are giant copyright scofflaws that steal the news.

“The question is, should we be allowing Google to steal all our copyright … not steal, but take,” Murdoch says. “Not just them, but Yahoo.”

But whether search-engine news aggregation is theft or a protected fair use under copyright law is unclear, even as Google and Yahoo profit tremendously from linking to news. So maybe Murdoch is right.

Murdoch made his comments late Thursday during an address at the Cable Show, an industry event held in Washington. He seemingly was blaming the web, and search engines, for the news media’s ills.

“People reading news for free on the web, that’s got to change,” he said.

Real estate magnate Sam Zell made similar comments in 2007 when he took over the Tribune Company and ran it into bankruptcy.

We suspect Zell and Murdoch are just blowing smoke. If they were not, perhaps they could demand Google and Yahoo remove their news content. The search engines would kindly oblige.

Better yet, if Murdoch and Zell are so set on monetizing their web content, they should sue the search engines and claim copyright violations in a bid to get the engines to pay for the content.

The outcome of such a lawsuit is far from clear.

It’s unsettled whether search engines have a valid fair use claim under the Digital Millennium Copyright Act. The news headlines are copied verbatim, as are some of the snippets that go along.

Fred von Lohmann of the Electronic Frontier Foundation points out that “There’s not a rock-solid ruling on the question.”

Should the search engines pay up for the content? Tell us what you think.

Read Full Post »

August 02, 2008

The Marginal Utility of Internet Companies

GatewayWhile I have started work in a new company (since it is in the setting up phase), a lot of interest news have emerged. With the plethora of them, I have decided to look at the following ones: (i) Facebook suing studiVZ, a German clone of the social network, (ii) the disastrous public relations disaster of Cuil, the new search engine who claimed to rival Google in indexing and search, and (iii) the launch of Google’s several new apps and initiatives: Knol and Lively (and I have a gripe with Google together with Picassa as well – they don’t have a Mac OS X version). indicates the feel that all the Internet companies are reaching marginal utility and it is a phase where everyone is seeking the next big thing. It reminded me of a question posed to Bill Gates before his departure from an active role in Microsoft during the BBC interview that he thinks that the two people in a Starbucks Cafe working on new internet innovation has now taken over the two people innovating in their parents’ garage. Here are some thoughts about why I believe that everyone is in the marginal utility phase of the web evolution.

  • Xiaonei Legal Action as a gesture to stop regional or geographic specific Facebook clones to arise: Why did Facebook choose to sue studiVZ, not XiaoNei (Chinese facebook clone) in China? If you ask me, I was looking at the interface of studiVZ and realized that they have at least changed parts of the interface. At least they bothered to paint it red instead of blue. If you look at the chinese facebook clone, XiaoNei, it is a blatant copy word to word and box to box imitation of Facebook. Guess what, XiaoNei has even cloned Facebook’s approach to open up the platform for developers and not to mention they have raised US$430 M for their funding and they are better funded than Facebook. I am pretty sure that the executives in Facebook must have realized that it is harder to deal with the perfect clone, but it may be easier to take out a German clone where there is clear legislation that favours intellectual property. One thing is beginning to happen if you have been studying the trends of online social networks, is that different countries preferred different social networks. In India and Brazil, people like Google’s Orkut and people in Thailand like Hi5. In fact, there is now an added dimension, the generic social network can now be broken into niches groups. For example, MyYearBook.com, a social network for teenagers (which means that they might grow up with another social networks) just raises US$13M. Of course, Facebook has started to internationalize their portal for other languages as an appropriate response to deal with the regional specific issues, but the generational issue of online social network users will continue to remain a challenge for them.
  • Cuil1 The Clone Wars returns and now the Europeans and Americans are cloning each other: We used to like to talk about how the Chinese clones US web 2.0 start-ups and make a killing. Seriously, the Europeans are no better. Think about studiVz, a facebook clone in Germany, and the Europeans are beginning to realize that they can do it just as well with their natural language advantage. In fact, the Amercians are doing it within among themselves. Google has not been innovative recently with Knol, who we know directly competes with Wikipedia except that it does not harnass the wisdom of crowds, and Lively, which can be their first foray to virtual worlds, and guess what they are thinking about what I have said before. Once the blogger gets tools to propagate the virtual as an avatar, the virtual world market may be trigger another web evolution via this line of thinking. Even Google is facing a clone in her backyard with Cuil, but the problem is that Cuil did not ensure that their product and ended in a PR disaster. Without haste, they have started to talk about search quality in their blogs again.
  • KnolscreenshotMarginal Utility, Everyone?: My guess is that every company is having marginal utility including Google, as they have started a venture capital company to find new ideas that they have not otherwise thought of. Of course, unlike the last web bubble, we don’t see any more of the astronomical valuations and given that we are moving along with a credit crisis in the US, people will be conservative, which means investments will flow slowly.

Of course, while the internet web services are competing out there, it might be better to look for a blue ocean where the mobile meets the web or even explore new emerging markets where the clones might succeed.


TrackBack URL for this entry:

Listed below are links to weblogs that reference The Marginal Utility of Internet Companies:

Read Full Post »

« Newer Posts - Older Posts »

%d bloggers like this: