Feeds:
Posts
Comments

Archive for March, 2008


News Alert

House Majority Leader Tom DeLay Indicted


Wikipedia Questions Paths to More Money

 
Jimmy Wales, founder of Wikipedia, answers a question during an interview with the Associated Press in St. Petersburg, Fla. in this June 29, 2007 file photo. With 2 million articles in English alone, Wikipedia, the Internet encyclopedia “anyone can edit,” stormed the Web’s top ranks through the work of unpaid volunteers and the assistance of donors. But that means Wikipedia has far less financial clout than its Web peers, and doing almost anything to improve that situation invites scrutiny from the same community that proudly generates the content. (AP Photo/Chris O’Meara, file) (Chris O’meara – AP)

 

By BRIAN BERGSTEIN

The Associated Press
Friday, March 21, 2008; 5:22 AM

— Scroll the list of the 10 most popular Web sites in the U.S., and you’ll encounter the Internet’s richest corporate players _ names like Yahoo, Amazon.com, News Corp., Microsoft and Google.
Except for No. 7: Wikipedia. And there lies a delicate situation.With 2 million articles in English alone, the Internet encyclopedia “anyone can edit” stormed the Web’s top ranks through the work of unpaid volunteers and the assistance of donors. But that gives Wikipedia far less financial clout than its Web peers, and doing almost anything to improve that situation invites scrutiny from the same community that proudly generates the content.And so, much as how its base of editors and bureaucrats endlessly debate touchy articles and other changes to the site, Wikipedia’s community churns with questions over how the nonprofit Wikimedia Foundation, which oversees the project, should get and spend its money.

Should it proceed on its present course, soliciting donations largely to keep its servers running? Or should it expand other sources of revenue _ with ads, perhaps, or something like a Wikipedia game show _ to fulfill grand visions of sending DVDs or printed books to people who lack computers? Is it helpful _ or counter to the project’s charitable, free-information mission _ to have the Wikimedia Foundation tight with a prominent venture capital firm?

These would be tough questions for any organization, let alone one in which hundreds of participants can expect to have a say.

The system “has strengths and weaknesses,” says Jimmy Wales, Wikipedia’s co-founder and “chairman emeritus.” “The strength is, we don’t do anything randomly, without lots and lots of lots of discussion. The downside is we don’t get anything done unless we actually come to a conclusion.”

Even the foundation’s leaders aren’t unified. Florence Devouard, a French plant scientist who chairs the board, said she and other Europeans involved with the project are more skeptical than Americans such as Wales about moneymaking side projects with for-profit entities.

The project’s financial situation is not exactly dire. Although the group does not have an endowment fund with interest fueling operations, cash contributions jumped to $2.2 million last year, from $1.3 million in the prior year. With big gifts recently, the foundation’s budget is $4.6 million this year.

In the past year, the foundation has tried to become less of an ad hoc outfit, expanding staff from less than 10 people to roughly 15 and moving to San Francisco from St. Petersburg, Fla. It has a new executive director, Sue Gardner, formerly head of the Canadian Broadcasting Corp.’s Web operations, who expects to add professional fund-raisers and improve ties with Wikimedia patrons.

“Two years ago, if you donated $10,000, you might not even get a phone call or a thank-you letter,” Wales said. “That’s just not acceptable.”

Gardner appears to favor an incremental strategy, stretching the staff to 25 people by 2010, with the budget increasing toward $6 million. Even such relatively simple changes, she said, would keep the foundation from missing out on business partnerships and other opportunities.

For example, project leaders would like to hold “Wikipedia Academies” in developing countries, to encourage new cadres of contributors in other languages. Wales also wants to implement software that makes it less technically daunting for newcomers to edit Wikipedia articles _ an idea that has been discussed for at least two years.

It might seem surprising that such a low-key agenda could prove contentious, given that Wikimedia and Wales have also encountered complaints of being incautious with donors’ money. But some Wikipedians want the foundation to be spending more.”Why should they have to be wise spending such a little amount of money when they could have so much more?” said Nathan Awrich, a Wikipedia contributor from Vermont who advocates limited ads on the site, to help pay for technical improvements, better outreach and even a legal-defense fund. “This is not a foundation that needs to last one more year. This is a foundation that needs to be planning for a longer term, and it doesn’t seem like they’re doing it.”Gardner said she opposes advertising unless it came down to a choice between “shutting down the servers and putting ads on the site. I don’t think we’re ever going to get to that point, so I don’t see advertising as an issue.”

Wales sounds political on the matter. On one hand, he said he believes “advertising is really a nonstarter” because of the potential harm to Wikipedia’s noncommercial image. However, he also said the subject requires more research, so Wikipedians truly understand how much money the project is leaving on the table by rejecting ads.

“I think it’s a fallacy to say learning about something implies you want to do it,” he said. “I would like to learn about it because I suspect it’s not worth it.”

Another subject getting carefully parsed is the foundation’s relationship with Elevation Partners, the venture firm co-founded by Roger McNamee and U2’s Bono. Elevation owns stakes in Forbes magazine and Palm Inc., among other companies.

McNamee has donated at least $300,000 to the Foundation, according to Danny Wool, a former Wikimedia employee who processed the transactions. More recently, the foundation said, McNamee introduced the group to people who made separate $500,000 gifts. Their identities have not been disclosed.

Officially, Gardner and McNamee say he is a merely a fan of Wikimedia’s free-information project, separate from Elevation’s profit-making interests. “He has been clear _ when he talks to me, he’s talking as a private individual,” Gardner said.

Yet the relationship runs deeper than that would suggest.

Another Elevation partner, Marc Bodnick, has met with Wales multiple times and went to a 2007 Wikimedia board meeting in the Netherlands. (Wales described that as a “get to know you session” and said Elevation, among many other venture firms, quickly learned that the foundation was not interested in changing its core, nonprofit mission.)

Bodnick and Bono had also been with Wales in 2006 in Mexico City, where U2 was touring. On a hotel rooftop, Bono suggested that Wikipedia use its volunteer-written articles as a starting point, then augment that with professionals who would polish and publish the content, according to two people who were present. Bono compared it to Bob Dylan going electric _ a jarring move that people came to love.

McNamee and Bodnick declined to comment.

Although Wales says no business with Elevation is planned, that hasn’t quelled that element ever-present in Wikipedia: questions.

In the recent interview, Devouard, the board chair, said she believed Elevation was interested in being more than just friends, though she wasn’t sure just what the firm hoped to get out of the nonprofit project.

“It is easy to see which interest WE have in getting their interest,” she wrote to Wales that day on an internal board mailing list, in an exchange obtained by The Associated Press. “The contrary is not obvious at all: Can you explain to me why EP (Elevation Partners) are interested in us?”

___

On the Net:

http://www.wikimedia.org

Read Full Post »

 


News Alert

House Majority Leader Tom DeLay Indicted


FBI Opens Probe of China-Based Hackers

Washington Post Staff Writers
Friday, March 21, 2008; Page A02

The FBI has opened a preliminary investigation of a report that China-based hackers have penetrated the e-mail accounts of leaders and members of the Save Darfur Coalition, a national advocacy group pushing to end the six-year-old conflict in Sudan.

The accounts of 10 members were hacked into between early February and last week, and the intruders also gained access to the group’s Web server and viewed pages from the inside, the group said yesterday.The intruders, said coalition spokesman M. Allyn Brooks-LaSure, “seemed intent on subversively monitoring, probing and disrupting coalition activities.” He said Web site logs and e-mails showed Internet protocol addresses that were traced to China.

The allegation fits a near decade-old pattern of cyber-espionage and cyber-intimidation by the Chinese government against critics of its human rights practices, experts said. It comes as calls for a boycott of the 2008 Beijing Olympics have been mounting since China’s crackdown on Tibetan protesters last week.

The coalition, headquartered in Washington, has been a vocal critic of China’s support for the Sudanese government and its refusal to allow anyone to pressure Khartoum to end the conflict. The group has urged China — Sudan’s chief diplomatic sponsor, major weapons provider and largest foreign investor and trade partner — to use its position as a member of the U.N. Security Council to bring peace to the region.

“Someone in Beijing is clearly trying to send us a message,” coalition President Jerry Fowler said. “But they’re mistaken if they think these attacks will end efforts to bring peace to Darfur.”

A senior Chinese official, who spoke on the condition of anonymity, said the allegation is false.

Read Full Post »

Tuesday, March 18, 2008

Is venture capital’s love affair with Web 2.0 over? | Tech news blog – CNET News.com

“Silicon Valley remains the hotbed of Web 2.0 activity, but the hipness of start-ups with goofy names is starting to cool in the face of economic reality.
Dow Jones VentureSource on Tuesday released numbers of venture capital activity in Web 2.0 companies and declared that the ‘investment boom may be peaking.’
Venture capitalists put $1.34 billion into 178 deals in 2007, an 88 percent jump over 2006. But once you strip out the $300 million that Facebook raised from Microsoft and others, the numbers don’t look as bullish.
The pace of deal flow, or the number of fundings, has slowed, particularly in the San Francisco Bay Area. Deal flow in 2007 went up 25 percent to 178 deals, but nearly all of those occurred outside the Bay Area, where the number of deals slipped downward.
‘Web 2.0 deals in the Bay Area actually dropped from 74 deals in 2006 to 69 last year and investments were down 3 percent from the $431 million invested in 2006. It’s clear that the real growth in the Web 2.0 sector is happening outside of the Bay Area,’ Jessica Canning, director of global research at Dow Jones VentureSource, said in a statement.”

Read Full Post »

How Can Air Travel Be Free?

Tech Biz  :  IT   RSS

How Can Air Travel Be Free?

By Chris Anderson Email 02.25.08 | 12:00 AM

Chart: Steven Leckart; Chart design: Nicholas Felton; Sources: Inviseomedia, Ryanair

Every year, about 1.3 million passengers fly from London to Barcelona. A ticket on Dublin-based low-cost airline Ryanair is just $20 (10 pounds). Other routes are similarly cheap, and Ryanair’s CEO has said he hopes to one day offer all seats on his flights for free (perhaps offset by in-air gambling, turning his planes into flying casinos). How can a flight across the English Channel be cheaper than the cab ride to your hotel?

A) Cut costs: Ryanair boards and disembarks passengers from the tarmac to trim gate fees. The airline also negotiates lower access fees from less-popular airports eager for traffic. B) Ramp up the ancillary fees: Ryanair charges for in-flight food and beverages; assesses extra fees for preboarding, checked baggage, and flying with an infant; collects a share of car rentals and hotel reservations booked through the Web site; charges marketers for in-flight advertising; and levies a credit-card handling fee for all ticket purchases. C) Offset losses with higher fares: On popular travel days, the same flight can cost more than $100.

Read Full Post »

Science 2.0: Great New Tool, or Great Risk?

Wikis, blogs and other collaborative web technologies could usher in a new era of science. Or not.

By M. Mitchell Waldrop

  Back



Welcome to a Scientific American experiment in “networked journalism,” in which readers—you—get to collaborate with the author to give a story its final form.

The article, below, is a particularly apt candidate for such an experiment: it’s my feature story on “Science 2.0,” which describes how researchers are beginning to harness wikis, blogs and other Web 2.0 technologies as a potentially transformative way of doing science. The draft article appears here, several months in advance of its print publication, and we are inviting you to comment on it. Your inputs will influence the article’s content, reporting, perhaps even its point of view.

So consider yourself invited. Please share your thoughts about the promise and peril of Science 2.0.—just post your inputs in the Comment section below. To help get you started, here are some questions to mull over:

  • What do you think of the article itself? Are there errors? Oversimplifications? Gaps?
  • What do you think of the notion of “Science 2.0?” Will Web 2.0 tools really make science much more productive? Will wikis, blogs and the like be transformative, or will they be just a minor convenience?
  • Science 2.0 is one aspect of a broader Open Science movement, which also includes Open-Access scientific publishing and Open Data practices. How do you think this bigger movement will evolve?
  • Looking at your own scientific field, how real is the suspicion and mistrust mentioned in the article? How much do you and your colleagues worry about getting “scooped”? Do you have first-hand knowledge of a case in which that has actually happened?
  • When young scientists speak out on an open blog or wiki, do they risk hurting their careers?
  • Is “open notebook” science always a good idea? Are there certain aspects of a project that researchers should keep quite, at least until the paper is published?

–M. Mitchell Waldrop

The explosively growing World Wide Web has rapidly transformed retailing, publishing, personal communication and much more. Innovations such as e-commerce, blogging, downloading and open-source software have forced old-line institutions to adopt whole new ways of thinking, working and doing business.

Science could be next. A small but growing number of researchers–and not just the younger ones–have begun to carry out their work via the wide-open blogs, wikis and social networks of Web 2.0. And although their efforts are still too scattered to be called a movement–yet–their experiences to date suggest that this kind of Web-based “Science 2.0” is not only more collegial than the traditional variety, but considerably more productive.

“Science happens not just because of people doing experiments, but because they’re discussing those experiments,” explains Christopher Surridge, editor of the Web-based journal, Public Library of Science On-Line Edition (PLoS ONE). Critiquing, suggesting, sharing ideas and data–communication is the heart of science, the most powerful tool ever invented for correcting mistakes, building on colleagues’ work and creating new knowledge. And not just communication in peer-reviewed papers; as important as those papers are, says Surridge, who publishes a lot of them, “they’re effectively just snapshots of what the authors have done and thought at this moment in time. They are not collaborative beyond that, except for rudimentary mechanisms such as citations and letters to the editor.”

The technologies of Web 2.0 open up a much richer dialog, says Bill Hooker, a postdoctoral cancer researcher at the Shriners Hospital for Children in Portland, Ore., and the author of a three-part survey of open-science efforts in the group blog, 3 Quarks Daily. “To me, opening up my lab notebook means giving people a window into what I’m doing every day. That’s an immense leap forward in clarity. In a paper, I can see what you’ve done. But I don’t know how many things you tried that didn’t work. It’s those little details that become clear with open notebook, but are obscured by every other communication mechanism we have. It makes science more efficient.” That jump in efficiency, in turn, could have huge payoffs for society, in everything from faster drug development to greater national competitiveness.

Of course, many scientists remain highly skeptical of such openness–especially in the hyper-competitive biomedical fields, where patents, promotion and tenure can hinge on being the first to publish a new discovery. From that perspective, Science 2.0 seems dangerous: using blogs and social networks for your serious work feels like an open invitation to have your online lab notebooks vandalized–or worse, have your best ideas stolen and published by a rival.

To Science 2.0 advocates, however, that atmosphere of suspicion and mistrust is an ally. “When you do your work online, out in the open,” Hooker says, “you quickly find that you’re not competing with other scientists anymore, but cooperating with them.”

Rousing Success
In principle, says PLoS ONE’s Surridge, scientists should find the transition to Web 2.0 perfectly natural. After all, since the time of Galileo and Newton, scientists have built up their knowledge about the world by “crowd-sourcing” the contributions of many researchers and then refining that knowledge through open debate. “Web 2.0 fits so perfectly with the way science works, it’s not whether the transition will happen but how fast,” he says.

The OpenWetWare project at MIT is an early success. Launched in the spring of 2005 by graduate students working for MIT biological engineers Drew Endy and Thomas Knight, who collaborate on synthetic biology, the project was originally seen as just a better way to keep the two labs’ Web sites up to date. OpenWetWare is a wiki–a collaborative Web site that can be edited by anyone who has access to it; it even uses the same software that underlies the online encyclopedia Wikipedia. Students happily started posting pages introducing themselves and their research, without having to wait for a Webmaster to do it for them.

But then, users discovered that the wiki was also a convenient place to post what they were learning about lab techniques: manipulating and analyzing DNA, getting cell cultures to grow. “A lot of the ‘how-to’ gets passed around as lore in biology labs, and never makes it into the protocol manuals,” says Jason Kelly, a graduate student of Endy’s who now sits on the OpenWetWare steering committee. “But we didn’t have that.” Most of the students came from a background in engineering; theirs was a young lab with almost no mentors. So whenever a student or postdoc managed to stumble through a new protocol, he or she would write it all down on a wiki page before the lessons were forgotten. Others would then add whatever new tricks they had learned. This was not altruism, notes steering-committee member Reshma Shetty. “The information was actually useful to me.” But by helping herself, she adds, “that information also became available around the world.”

Indeed, Kelly points out, “Most of our new users came to us because they’d been searching Google for information on a protocol, found it posted on our site, and said ‘Hey!’ As more and more labs got on, it became pretty apparent that there were lots of other interesting things they could do.”

Classes, for example. Instead of making do with a static Web page posted by a professor, users began to create dynamically evolving class sites where they could post lab results, ask questions, discuss the answers and even write collaborative essays. “And all stayed on the site, where it made the class better for next year,” says Shetty, who has created an OpenWetWare template for creating such class sites.

Laboratory management benefited too. “I didn’t even know what a wiki was,” recalls Maureen Hoatlin of the Oregon Health & Science University in Portland, where she runs a lab studying the genetic disorder Fanconi anemia. But she did know that the frenetic pace of research in her field was making it harder to keep up with what her own team members were doing, much less Fanconi researchers elsewhere. “I was looking for a tool that would help me organize all that information,” Hoatlin says. “I wanted it to be Web-based, because I travel a lot and needed to access it from wherever I was. And I wanted something my collaborators and group members could add to dynamically, so that whatever I saw on that Web page would be the most recently updated version.”

OpenWetWare, which Hoatlin saw in the spring of 2006, fit the bill perfectly. “The transparency turned out to be very powerful,” she says. “I came to love the interaction, the fact that people in other labs could comment on what we do and vice versa. When I see how fast that is, and its power to move science forward–there is nothing like it.”

Numerous others now work through OpenWetWare to coordinate research. SyntheticBiology.org, one of the site’s most active interest groups, currently comprises six laboratories in three states, and includes postings about jobs, meetings, discussions of ethics, and much more.

In short, OpenWetWare has quickly grown into a social network catering to a wide cross-section of biologists and biological engineers. It currently encompasses laboratories on five continents, dozens of courses and interest groups, and hundreds of protocol discussions–more than 6100 Web pages edited by 3,000 registered users. A May 2007 grant from the National Science Foundation launched the OpenWetWare team on a five-year effort to transform OpenWetWare to a self-sustaining community independent of its current base at MIT. The grant will also support development of many new practical tools, such as ways to interface biological databases with the wiki, as well as creation of a generic version of OpenWetWare that can be used by other research communities such as neuroscience, as well as by individual investigators.

Skepticism Persists
For all the participants’ enthusiasm, however, this wide-open approach to science still faces intense skepticism. Even Hoatlin found the openness unnerving at first. “Now I’m converted to open wikis for everything possible,” she says. “But when I originally joined I wanted to keep everything private”–not least to keep her lab pages from getting trashed by some random hacker. She did not relax until she began to understand the system’s built-in safeguards.

First and foremost, says MIT’s Kelly, “you can’t hide behind anonymity.” By default, OpenWetWare pages are visible to anyone (although researchers have the option to make pages private.) But unlike the oft-defaced Wikipedia, the system will let users make changes only after they have registered and established that they belong to a legitimate research organization. “We’ve never yet had a case of vandalism,” Kelly says. Even if they did, the wiki automatically maintains a copy of every version of every page posted: “You could always just roll back the damage with a click of your mouse.”

Unfortunately, this kind of technical safeguard does little to address a second concern: Getting scooped and losing the credit. “That’s the first argument people bring to the table,” says Drexel University chemist Jean-Claude Bradley, who created his independent laboratory wiki, UsefulChem, in December 2005. Even if incidents are rare in reality, Bradley says, everyone has heard a story, which is enough to keep most scientists from even discussing their unpublished work too freely, much less posting it on the Internet.

However, the Web provides better protection that the traditional journal system, Bradley maintains. Every change on a wiki gets a time-stamp, he notes, “so if someone actually did try to scoop you, it would be very easy to prove your priority–and to embarrass them. I think that’s really what is going to drive open science: the fear factor. If you wait for the journals, your work won’t appear for another six to nine months. But with open science, your claim to priority is out there right away.”

Under Bradley’s radically transparent “open notebook” approach, as he calls it, everything goes online: experimental protocols, successful outcomes, failed attempts, even discussions of papers being prepared for publication. “A simple wiki makes an almost perfect lab notebook,” he declares. The time-stamps on every entry not only establish priority, but allow anyone to track the contributions of every person, even in a large collaboration.

Bradley concedes that there are sometimes legitimate reasons for researchers to think twice about being so open. If work involves patients or other human subjects, for example, privacy is obviously a concern. And if you think your work might lead to a patent, it is still not clear that the patent office will accept a wiki posting as proof of your priority. Until that is sorted out, he says, “the typical legal advice is: do not disclose your ideas before you file.”

Still, Bradley says the more open scientists are, the better. When he started UsefulChem, for example, his lab was investigating the synthesis of drugs to fight diseases such as malaria. But because search engines could index what his team was doing without needing a bunch of passwords, “we suddenly found people discovering us on Google and wanting to work together. The National Cancer Institute contacted me wanting to test our compounds as anti-tumor agents. Rajarshi Guha at Indiana University offered to help us do calculations about docking–figuring out which molecules will be reactive. And there were others. So now we’re not just one lab doing research, but a network of labs collaborating.”

Blogophobia
Although wikis are gaining, scientists have been strikingly slow to embrace one of the most popular Web 2.0 applications: Web logging, or blogging.

“It’s so antithetical to the way scientists are trained,” Duke University geneticist Huntington F. Willard said at the April 2007 North Carolina Science Blogging Conference, one of the first national gatherings devoted to this topic. The whole point of blogging is spontaneity–getting your ideas out there quickly, even at the risk of being wrong or incomplete. “But to a scientist, that’s a tough jump to make,” says Willard, head of Duke’s Institute for Genome Sciences & Policy. “When we publish things, by and large, we’ve gone through a very long process of drafting a paper and getting it peer reviewed. Every word is carefully chosen, because it’s going to stay there for all time. No one wants to read, ‘Contrary to the result of Willard and his colleagues…’.”

Still, Willard favors blogging. As a frequent author of newspaper op-ed pieces, he feels that scientists should make their voices heard in every responsible way possible. Blogging is slowly beginning to catch on; because most blogs allow outsiders to comment on the individual posts, they have proved to be a good medium for brainstorming and discussions of all kinds. Bradley’s UsefulChem blog is an example. Paul Bracher’s Chembark is another. “Chembark has morphed into the water cooler of chemistry,” says Bracher, who is pursuing his Ph.D. in that field at Harvard University. “The conversations are: What should the research agencies be funding? What is the proper way to manage a lab? What types of behavior do you admire in a boss? But instead of having five people around a single water cooler you have hundreds of people around the world.”

Of course, for many members of Bracher’s primary audience–young scientists still struggling to get tenure–those discussions can look like a minefield. A fair number of the participants use pseudonyms, out of fear that a comment might offend some professor’s sensibilities, hurting a student’s chances of getting a job later. Other potential participants never get involved because they feel that time spent with the online community is time not spent on cranking out that next publication. “The peer-reviewed paper is the cornerstone of jobs and promotion,” says PLoS ONE’s Surridge. “Scientists don’t blog because they get no credit.”

The credit-assignment problem is one of the biggest barriers to the widespread adoption of blogging or any other aspect of Science 2.0, agrees Timo Hannay, head of Web publishing at the Nature Publishing Group in London. (That group’s parent company, Macmillan, also owns Scientific American.) Once again, however, the technology itself may help. “Nobody believes that a scientist’s only contribution is from the papers he or she publishes,” Hannay says. “People understand that a good scientist also gives talks at conferences, shares ideas, takes a leadership role in the community. It’s just that publications were always the one thing you could measure. Now, however, as more of this informal communication goes on line, that will get easier to measure too.”

Collaboration the Payoff
The acceptance of any such measure would require a big change in the culture of academic science. But for Science 2.0 advocates, the real significance of Web technologies is their potential to move researchers away from an obsessive focus on priority and publication, toward the kind of openness and community that were supposed to be the hallmark of science in the first place. “I don’t see the disappearance of the formal research paper anytime soon,” Surridge says. “But I do see the growth of lots more collaborative activity building up to publication.” And afterwards as well: PLoS ONE not only allows users to annotate and comment on the papers it publishes online, but to rate the papers’ quality on a scale of 1 to 5.

Meanwhile, Hannay has been taking the Nature group into the Web 2.0 world aggressively. “Our real mission isn’t to publish journals, but to facilitate scientific communication,” he says. “We’ve recognized that the Web can completely change the way that communication happens.” Among the efforts are Nature Network, a social network designed for scientists; Connotea, a social bookmarking site patterned on the popular site del.icio.us, but optimized for the management of research references; and even an experiment in open peer review, with pre-publication manuscripts made available for public comment.

Indeed, says Bora Zivkovic, a circadian rhythm expert who writes at Blog Around the Clock, and who is the Online Community Manager for PLoS ONE, the various experiments in Science 2.0 are now proliferating so rapidly that it is almost impossible to keep track of them. “It’s a Darwinian process,” he says. “About 99 percent of these ideas are going to die. But some will emerge and spread.”

“I wouldn’t like to predict where all this is going to go,” Hooker adds. “But I’d be happy to bet that we’re going to like it when we get there.”

Read Full Post »

Yahoo!/Microsoft Execs Meet For Round Two

Posted by Zonk on Sunday March 16, @03:21PM
from the ready-steady-fight dept.
psychosmyth writes “Microsoft’s deal to Yahoo! is apparently back on the table. Yahoo execs met again with Microsoft early this past week to re-discuss the deal that fell through earlier. ‘The gathering, first reported by The Wall Street Journal, gave Microsoft its first chance to sell Yahoo on the rationale for the proposed marriage since the software maker unveiled its plans six weeks ago. Since then, Yang has been exploring different ways to ward off Microsoft. The alternatives have included possible alliances with Internet search and advertising leader Google Inc., News Corp.’s MySpace.com and Time Warner Inc.’s AOL.’ Microsoft is apparently still keeping all of its options open; a hostile take-over is not out of the question.”

Read Full Post »

Yahoo Sets Bullish Financial Targets

By Shira Ovide
Word Count: 526  |  Companies Featured in This Article: Yahoo, Microsoft, Google
Yahoo Inc. sought to paint a rosy picture of its financial future as the Internet company pitches its rationale for turning down a takeover offer from Microsoft Corp.

Microsoft in February lobbed a hostile bid to buy Yahoo, a deal now worth about $42 billion. Yahoo turned away the offer as too low. In materials made public Tuesday, Yahoo offered its clearest outline yet of how the company expects to grow on its own. (Read Yahoo’s presentation.)

Yahoo plans to use the presentation as it begins about a week of meetings with its largest shareholders. The discussions essentially act as …

Read Full Post »

Hakia – First Meaning-based Search Engine

Written by Alex Iskold / December 7, 2006 12:08 PM / 43 Comments


Written by Alex Iskold and edited by Richard MacManus. There has been a lot of talk lately about 2007 being the year when we will see companies roll out Semantic Web technologies. The wave started with John Markoff’s article in NY Times and got picked up by Dan Farber of ZDNet and in other media. For background on the Semantic Web in this era, check out our post entitled The Road to the Semantic Web. Also for a lengthy, but very insightful, primer on Semantic Web see Nova Spivak’s recent article.

The media attention is not accidental. Because Semantic Web promises to help solve information overload problems and deliver major productivity gains, there is a huge amount of resources, engineering and creativity that is being thrown at the Semantic Web. 

What is also interesting is that there are different problems that need to be solved, in order for things to fall into place. There needs to be a way to turn data into metadata, either at time of creation or via natural language processing. Then there needs to be a set of intelligence, particularly inside the browser, to take advantage of the generated metadata. There are many other interesting nuances and sub-problems that need to be solved, so the Semantic Web marketplace is going to have a rich variety of companies going after different pieces of the puzzle. We are planning to cover some of these companies working in the Semantic Web space, so watch out for more coverage here on Read/WriteWeb.

Hakia: how is it different from Google?

The first company we’ll cover is Hakia, which is a “meaning-based” search engine startup getting a bit of buzz. It is a venture-backed, multi-national team company headquartered in New York – and curiously has former US senator Bill Bradley as a board member. It launched its beta in early November this year, but already ranks around 33K on Alexa – which is impressive. They are scheduled to go live in 2007.

The user interface is similar to Google, but the engine prompts you to enter not just keywords – but a question, a phrase, or a sentence. My first question was: What is the population of China?

As you can see the results were spot on. I ran the same query on Google and got very similar results, but sans flag. Looking carefully over the results in Hakia, I noticed the message:

“Your query produced the Hakia gallery for China. What else do you want to know about China?”

At first this seems like a value add. However, after some thinking about it – I am not sure. What seems to have happened is that instead of performing the search, Hakia classified my question and pulled the results out of a particular cluster – i.e. China. To verify this hypothesis, I ran another query: What is the capital of china?. The results again suggested a gallery for China, but did not produce the right answer. Now to Hakia’s credit, it recovered nicely when I typed in:

Hakia experiments

Next I decided to try out some of the examples that the Hakia team suggests on its homepage, along with some of my own. The first one was Why did the chicken cross the road?, which is a Hakia example. The answers were fine, focusing on the ironic nature of the question. Particularly funny was Hakia’s pick:

My next query was more pragmatic: Where is the Apple store in Soho? (another example from Hakia). The answer was perfect. I then performed the same search on Google and got a perfect result there too. 

Then I searched for Why did Enron collapse?. Again Hakia did well, but not noticeably better than Google. However, I did see one very impressive thing in Hakia. In its results was this statement: Enron’s collapse was not caused by overstated resource reserves, but by another kind of overstatement. This is pretty witty…. but I am still not convinced that it is doing semantic analysis. Here is why: that reply is not constructed out of words because Hakia understands the semantics of the question. Instead, it pulled this sentence out of one of the documents which had a high rank, that matches the Why did Enron collapse? query.

In my final experiment, Hakia beat Google hands down. I asked Why did Martha Stewart go to jail? – which is not one of Hakia’s homebrewed examples, but it is fairly similar to their Enron example. Hakia produced perfect results for the Martha question:

Hakia is impressive, but does it really understand meaning?

I have to say that Hakia leaves me intrigued. Despite the fact that it could not answer What does Hakia mean? and despite the fact that there isn’t sufficient evidence yet that it really understands meaning. 

It’s intriguing to think about the old idea of being able to type a question into a computer and always getting a meaningful answer (a la the Turing test). But right now I am mainly interested in Hakia’s method for picking the top answer. That seems to be Hakia’s secret sauce at this point, which is unique and works quite well for them. Whatever heuristic they are using, it gives back meaningful results based on analysis of strings – and it is impressive, at least at first.

Hakia and Google

Perhaps the more important question is: Will Hakia beat Google? Hakia itself has no answer, but my answer at this point is no. This current version is not exciting enough and the resulting search set is not obviously better. So it’s a long shot that they’ll beat Google in search. I think if Hakia presented one single answer for each query, with the ability to drill down, it might catch more attention. But again, this is a long shot.

The final question is: Is semantical search fundamentally better than text search?. This is a complex question and requires deep theoretical expertise to answer it definitively. Here are a few hints…. 

Google’s string algorithm is very powerful – this is an undeniable fact. A narrow focused vertical search engine, that makes a lot of assumptions about the underlying search domain (e.g. Retrevo) does a great job in finding relevant stuff. So the difficulty that Hakia has to overcome is to quickly determine the domain and then to do a great job searching inside the domain. This is an old and difficult problem related to the understanding of natural language and AI. We know it’s hard, but we also know that it is possible. 

While we are waiting for all the answers, please give Hakia a try and let us know what you think.

Leave a comment or trackback on ReadWriteWeb and be in to win a $30 Amazon voucher – courtesy of our competition sponsors AdaptiveBlue and their Netflix Queue Widget.

6 TrackBacks

Listed below are links to blogs that reference this entry: Hakia – First Meaning-based Search Engine.TrackBack URL for this entry: http://www.readwriteweb.com/cgi-bin/mt/mt-tb.cgi/2895
2007 is going to be the year of the Semantic Web – and one of the first signs of that is the appearance of Semantic Search Engines that understand the meaning of phrases and can “extract” meaning out of diverse… Read More
» Hakia Article on Read/Write Web from SortiPreneur
R/WW has an early review of hakia and its semantic search endeavor. At the end, Alex Iskold answers the fundamental question that’s on everyone’s mind:Will Hakia beat Google? Hakia itself has no answer, but my answer at this point is Read More
» Hakia from nXplorer SEO & Marketing Blog
Auf http://www.hakia.com findet man hakia, eine Suchmaschine, welchen neben einzelnen W√∂rtern und Wortphrasen auch komplette Fragen verarbeiten kann. Ich habe sowohl auf deutsch als auch auf englisch einige Fragen gestellt aber keine vern√ºnftigen Antworten … Read More
» Search 2.0 – What’s Next? from Read/WriteWeb
Written by Emre Sokullu and edited by Richard MacManus You may feel relatively satisfied with the current search offerings of Google, Yahoo, Ask and MSN. Search today is undoubtedly much better than what it was in the second half of… Read More
» The Race to Beat Google from Read/WriteWeb
Written by Alex Iskold and edited by Richard MacManus In an article in the January 1st 2007 issue of NYTimes, reporter Miguel Helft writes about the race in Silicon Valley to beat Google. Certainly the future of search has been… Read More
» AI: Favored Search 2.0 Solution from Read/WriteWeb
In the current Read/WriteWeb poll (see below), we’re asking what ‘search 2.0’ concepts you think stand the best chance of beating Google. The results so far are interesting, because Artificial Intelligence is currently top pick – despite having a histo… Read More

Comments

Subscribe to comments for this post OR Subscribe to comments for all Read/WriteWeb posts

  • Good analysis, I wanted to write one but now there’s no need (:Anyway, I fail to see the difference between a ‘semantic’ search engine and a regular search engine. All search engines are ‘semantic’ in a way. If you type something like ‘How do you make a hot-dog’ in Google, it will give you the right answers. It won’t just search for “how”, then “do”, etc. and compile the results. It also has algorithms which know how to decipher the order of words in a sentence and other patterns that makes our writing meaningful.

    So, Hakia should do something really spectacular to beat Google with the semantic approach. It should actually be able to understand complex sentences better than Google, and as such be a search engine for more complex tasks, for example for questions like ‘I need drivers for Geforce 8800, but not the latest version’. Currently, compared to Google, it doesn’t deliver.

    Posted by: franticindustries | December 7, 2006 12:36 PM

  • What’s interesting is that Ask started out by trying to create just this type of search engine years ago. They abandoned that approach in favor of a more traditional Google competitor. So can we interpret from that that Ask learned that people would rather use a traditional search engine, or was there another reason for the switch?This type of semantical search technology seems especially well suited to encyclopedia sites like Wikipedia or Britannica. I.e., being able to type in “What is the capital of China?” at Wikipedia and get not only relevant topic articles about China, but also the specific answer, would be great. I would love to see a semantic search engine built into MediaWiki. But web search engines should, in my opinion, direct you to a variety of relevant sources.

    I don’t think I’d feel comfortable asking “What were the causes of the American Civil War?” and have the search engine only spit back one result answer (or, one viewpoint).

    Posted by: Josh | December 7, 2006 12:58 PM

  • Josh,Excellent points. I really like the Wiki idea.
    In terms of single answer, I think if you are looking for a quick answer – possibly, but otherwise you would defnitely want more results.

    The other thought occurs to me is that we might not necessarily need the new way of inputing the question in as much as we need new ways of getting the answer. So in a way, I view vertical search engines, like Retrevo, as approaching the same problem but from more pragmatic and better angle.

    Alex

    Posted by: Alex Iskold | December 7, 2006 1:02 PM

  • Greetings from hakia!Thanks for the review and comments. We appreciate feedback:-)

    We are still developing, it will CONTINUE TO IMPROVE as many of the meaning associations will form in time, like connecting the neurons inside the human brain during childhood. hakia is like a TWO-year old child on the cognitive scale. But it grows EXPONENTIALLY — much faster than a human.

    Cheers,

    Melek

    Posted by: melek pulatkonak | December 7, 2006 2:05 PM

  • Melek,Thats great! Please make sure it does not become self-aware. I would hate for it to experience the kind of pain we do 🙂

    Alex

    Posted by: Alex Iskold | December 7, 2006 2:19 PM

  • Noted:-)Melek

    Posted by: melek pulatkonak | December 7, 2006 2:25 PM

  • Hakia is promising, good to see this early review, but we’ll be able to judge them only after the official debut. Bad comments > /dev/nullPosted by: Emre Sokullu | December 7, 2006 2:55 PM

  • Hakia sounds quite Finnish – hakea means to fetch for instance.Reminds a little of Ms Dewey actually, but not as, errm, Flash. 🙂

    Posted by: Juha | December 7, 2006 3:58 PM

  • So, do they intend to read RDF? That is, the data about the data.I’d like to talk to them as it simple to read Content Labels. They can then provide users with more information about a sites *before* having to enter them… And that is based on Semantic capabilities 😉

    Posted by: Paul Walsh | December 7, 2006 4:31 PM

  • @Juha: yes, Hakia names comes from that Finish word. See About Us section of their site.Posted by: Emre Sokullu | December 7, 2006 5:03 PM

  • Paul,It seems to me that their claim to fame is that they do not need RDF because they mastered NLP (natural language processing).

    Alex

    Posted by: Alex Iskold | December 7, 2006 5:15 PM

  • That’s a great question you bring up though Paul. Semantic Web is really associated with RDF, thanks largely to Tim Berners-Lee’s relentless promotion of RDF as ‘HTML 2.0’ (to coin a very awkward phrase!). So how many of these new meaning-based search engines coming on the market will utilize RDF?Alex is much more of an expert in these things than me, but still NLP seems to me the harder route to take – given all the difficulties AI has had in the past.

    Posted by: Richard MacManus | December 7, 2006 6:34 PM

  • I think search engines need to focus on the social aspect. Tracking what users search for and allowing them to vote on sites. This allows them to make good decisions – to immediately understand the domain a housewife is referring to when she says soap and when a developer says the same.Posted by: David Mackey | December 7, 2006 7:59 PM

  • Hmmm, doesn’t like “Where can I find a good globe?” much (a recent search that hadn’t worked too well for me on Google or Froogle). First link is good practice guidelines and legislation reform, which appear to use the word “GLOBE” for some reason (I can’t torture it enough to make it an acronym). Granted, the second link was to an eBay auction for a globe. Third was an auction for a Lionel station light “with globe”. The first and third results suggest to me that the meaning of the question hadn’t been understood. Still, we’re talking beta here, and it’s a very difficult problem. It’ll be interesting to see how they progress.Posted by: T.J. Crowder | December 8, 2006 1:06 AM

  • Hello Melek,
    Hakia rocks, its a really good search experience!Cheers.

    Posted by: Abhishek Sharma | December 8, 2006 2:33 AM

  • A semantic search is quite different from a text search like Google, which is not primarily based on context and the relationship between words and resources, but on the occurrence and position of words.If Haika really does semantic searches it could easily distinguish itself from Google by generating new content (e.g.) answers, that combine relevant unique snippets of information to a semantic result/answer to a query, as opposed to just a list of resources like the other search engines do and Haika currently does. In that case you don’t have to visit the resources to get the answer.

    The query “What is the capital of Finland?”, could show Helsinki as an answer and provide related answers regarding history, population, etymology, other capitals etc.

    For this capability Haika should not only be able to do semantic searches, but entity extraction as well, since RDF and XML schema’s are not that widespread at the moment.

    If they can manage to do this, people won’t hesitate to abandon Google, especially because the Google brand is loosing it’s value rapidly because of SEO, spamming and privacy intrusions…

    Posted by: Gert-Jan van Engelen | December 8, 2006 4:04 AM

  • I think Hakia is bluffing if it claims to be ‘semantic’. I find it as semantic as Google :-)I tried questions like
    Why did the US attack Iraq?
    and
    Why did Israel attack Lebanon?

    It gace absolutely unrealted results which confirms that it is as good as as text search. However, when i tried the Q – “Who is Mahatama Gandhi?” – it immediately responded with a remark “See below the Mahatma Gandhi resume by hakia. What else do you want to know about Mahatma Gandhi?”

    My hunch is that Hakia guys have set up a word filter before the search query gets executed on its DB (call it a ‘semantic filter’ if you’s like). If it contains words like ‘Who’ or ‘What’ it is set to return the ‘resumes’ and ‘galariies’ for the rest of the search terms. But that isnt what a semantic is about – the engine still does not ‘understand’ my question – thats just a slightly ‘domain restricted’ search being performed.

    I could as well have a dropdown for domain (who, what etc) before the search box and retrict the search queries myself!

    While Hakia is not bad – i wont give up my Google for it!

    Posted by: Nikhil Kulkarni | December 8, 2006 8:25 AM

  • really? no one but me remembers askjeeves? i’m all about semantic web, but i’m also skeptical of the recycling of web 1.0 into web 2.0. gigaom & techcrunch have already covered a few companies who have tried this, and while i’m sure hakia is great, let’s not pretend they reinvented the wheel. the concept isn’t new.Posted by: geektastik | December 8, 2006 9:08 AM

  • “but already ranks around 33K on Alexa – which is impressive.”Impressive? Give it a break.

    Posted by: michal frackowiak | December 8, 2006 2:05 PM

  • As pointed out in #16, a Semantic Web search is radically different from a regular search. I see no reason to believe that Hakia has anything to do with the “Semantic Web” proper, as the underlying technologies – RDF, OWL, and so forth – simply are not in widespread use.If the people publishing data on the web are not publishing it in a format which is intended for consumption by the Semantic Web – and most people aren’t – then either Hakia has next to nothing to do with the Semantic Web, or they’ve made an earth-shattering breakthrough in Natural Language Processing.

    Posted by: Phillip Rhodes | December 8, 2006 2:07 PM

  • michal,33K rank is impressive given that the service just launched beta.

    Alex

    Posted by: Alex Iskold | December 8, 2006 2:26 PM

  • It’s my opinion that for a semantic search engine to *really* work properly, it will have to
    a. have demographic – based parsing logic, not just language – based.
    b. know the demographics of the user submitting the query.Posted by: Ernesto | December 8, 2006 2:31 PM

  • Ernesto,Add other factors like the stuff you like, etc. That would be more of a personalized search. I think the way to go is:

    Personalize( Semantic Search ) ==> Really cool stuff.

    Alex

    Posted by: Alex Iskold | December 8, 2006 2:36 PM

  • Remember that Google’s growth was spread basically by word of mouth not SUV megalith marketing.
    If google an upstart can do it to yahoo it can happen again.Posted by: Shinderpal jandu | December 8, 2006 2:49 PM

  • This concept didn’t work with ask.com, it ain’t gonna work again now. It simply isn’t how people search for information on the web.
    There are many ways to work search engines but I’m quite surprise we keep seeing the same thing over and over again. What we are missing are real innovations, not a second runner up of same clothes with a different name.Posted by: Sal | December 8, 2006 2:55 PM

  • Ask both of them (and Ask.com) this question:
    what is 5 plus 5?enough said.

    Posted by: Dave | December 8, 2006 3:01 PM

  • @Dave – duh. Things like calculating 5 plus 5 is a VERY simple matter of doing word associations with relevant mathematical operators. Something which I’m sure Hakia can achieve shortly.The more interesting phrases here are – as Melek mentioned above – “connections being formed cognitively” and “intelligent as a 2 year old”. Is the engine behind it aware of the data it parses and spits out? What is the level of awareness then – Word associations, lexical analysis, categorization and meaning vs actual causal factors?

    Posted by: Viksit | December 8, 2006 3:53 PM

  • Nice work, going to check out how this handles.Posted by: Tele Man | December 8, 2006 4:25 PM

  • Very interesting, and props to the developers. I know it’s not a new concept (as pointed out earlier, ASK did try to do it), but then again, neither was a GUI when Apple took over… these things take development — do you know how long the concept of the Macintosh was alive at Xerox park before Jobs discovered it and furthered the development into a now-common operating system? Give Hakia (and semantic-search) a change to develop. Recycled ideas usually have merit. That’s why they’re recycled. They just didn’t get developed 100% the first time around.I do, however, see Hakia as far away from success of semantics. To get the semantics perfectly, and accomplish its goal here, it really has to conquer Bloom’s Taxonomy of learning and apply it to each query; especially if it is to return one (or few) valued and cross-compiled results from different sources.

    Currently, it wouldn’t pass a TRUE Turing Test — just mimics the foreign language copied from book to carry on conversation argument proposed by (insert name here, I forget it at the moment…)
    ^Wow… I just referred to like 5 things I learned last quarter in my freshman computer science classes… that felt good. Hope my thoughts make sense. Keep up the work Hakia, I really would be impressed to see success here, I just think it would have to incorporate some AI which is not looking good (from my eyes, anyway).

    Posted by: Augie | December 8, 2006 9:08 PM

  • I think Hakia weighted W5 (Who, What, Where, When and Why) heavily in the search queries. I think Hakia is decent but I am still not too sure the difference in using semantical search or text search (if the text search query is specific enough).Posted by: andy kong | December 8, 2006 9:34 PM

  • While there is some growing interest in semantics and meaning, partly due to work in the semantic web and upstarts like Hakia, the first copy of the first semantic search engine was delivered to the Congressional Research Service in 1988. I know because I was there and I installed it for the research staff there.In your analysis you asked: Does Hakia really understand meaning?. I think the question that has to be answered first is: What does it mean to understand meaning?. Long before you come to the turning test, you have to come to understand what the term “semantics” means and how it is used and understood by those in and outside the domain of software and computational technology practice.

    The answer to the last question you offered: Is semantical search fundamentally better than text search? depends greatly upon what you think semantical means in a search and retrieval context.

    In a word though, the answer is a resounding Yes.

    I think, in its most common and general usage (among peoples) semantics refers to the interpretation of the significance of the relationships and interactions of subjects and objects in a situational context.

    For example, the semantics of the state of affairs in modern day Iraq range over a state of civil war to extreme cases of outside insurgencies intended to deceive and delude. When the semantics are cloudy and unclear, judgments and decisions about what and how to name particular aspects of the state of affairs can also be murky. Thereby interdependent judgments or decisions become delayed or the subject of further debate. Ideally you want to present a situation such that a uniform perception emerges, with semantics (significance) that drives or guides interpretations such that those that are relevant and those with the same validity or authority prevail.

    As the Bush administration has demonstrated, the process, the presentation, the semantics– can become political and highly charged. When questions of significance persist, that is, questions ranging over the signifier and signified in a given situation, uncertainty, lack of clarity and disarray blur and obscure any significance and generally erode confidence and delay action.

    This is not the kind of semantics the Semantic Web and AI technologies proclaim. In their quest to share and exchange information, they want just enough semantics to normalize data labels between systems so that they are able to exchange information and be sure they are referring to the same items in the data exchange. They want to use named references, with authority of course. In fact, they strive to clear and unambiguous semantics –a foreign concept to the Bush administration.

    But semantics has to do with the significance of interpretation. What is significant in our experience of the search and retrieval application. What is of significance in the results of the search engine? Relevance. The benefit of semantic search is greater relevance. For Hakia to be relevant, it has to offer more relevance than Google. A semantic search engine should also offer more– in my opinion.

    A modern language semantic search engine should offer more than relevance. It should offer insight. Rather than fixing semantics to simple categories for easy exchange, a truly semantical search engine should aid and assist one while exploring topics. It should help to relate language to abstract ideas instead of just connecting the keywords, names and nouns.

    Posted by: Ken Ewell | December 8, 2006 11:32 PM

  • No,It is not better than google ,type the ame questions in google and you wll get better answersPosted by: jyotheendra | December 8, 2006 11:37 PM

  • Gee golly, as far ahead of me Ken Ewell is in every sense of technological knowledge and understanding, I have to say… You went way off topic just to make a point about the Bush administration… I get so sick of that.Of course semantic search is better than connecting language parts. People may not think it’s better, but I argue that they only feel that way because they are used to searching with boolean operators and combinations of keywords. Everyone knows WHAT SPECIFICALLY they want to find, but some people have trouble putting their question into acceptable and successful search terms… Imagine never having to phrase a question specially for a search engine: just type what you’re wondering, and have an instand answer.

    Much easier than combining keywords with booleans to try to simplify natural language to “search engine” language!

    PS — No offense to you, Mr Ewell — I really do respect that your technological insights and opinions are worth 10 times my own because of the knowledge gap; I guess I just got really sick of seeing more politically charged comments in non-related areas… I’m just sick of politics all-together right now, I think. Not trying to start a flame-war or anything! 🙂

    Posted by: Auggie | December 9, 2006 1:36 AM

  • Great job done by hakiaI got the perfect answers to my questions in the top 3-5 links and this saved a lot of time.

    I am impressed

    Posted by: priya | December 9, 2006 11:42 AM

  • What about Chacha.com? they actually have guides who help you with your search.Posted by: Tori | December 9, 2006 3:26 PM

  • Unfortunately, Tori, I was unable to ever get a guide connected to use, but I do remember trying that out a few days ago and thinking it was a pretty cool concept… as long as they don’t charge you for it ever! Could you connect to guides?Posted by: Auggie | December 10, 2006 1:33 AM

  • Guides worked for me.Alex.

    Posted by: Alex Iskold | December 10, 2006 6:15 AM

  • Looks like there’s a /very/ long way to go yet. Given that “what is the capital of china” is semantically ambigous, I tried to be helpful:what is the administrative capital of China
    what is the administrative capital of the United States of America
    what is the administrative capital of the USA
    what is the administrative capital of the US

    Unfortunately, Hakia provided irrelevant answers to all four questions. Google got 4/4.

    Given the apparently overwhelming power of Google’s indexing algorithm and the extent of their dataset, a semantic-based search facility such as Hakia may have to seek a qualitatively different area of search in which to make a contribution.

    Posted by: Graham Higgins | December 10, 2006 7:33 AM

  • Ref: # 35Tried the so called ChaCha.com forget about getting any good result, it felt like I was doing a chat!!! Users around the world have limited attention period. Getting best (no precise) results with minimum efforts – that’s the key. Advanced search and Personalized search have been there for long time with no good impact on users.
    Hakia – doing good work, but it’s too early to say something concrete. In addition, I would not like to accept that Google doesn’t have sementic features in their search algorithm. I’m sure they are working on it or looking out for something good (startup kid).

    Posted by: Dhruba Baishya | December 16, 2006 7:24 PM

  • props to geektastik for doing what the author failed to do. Mention askjeeves.Posted by: Bog | December 19, 2006 9:41 AM

  • I mention Ask Jeeves in the second comment. ;)Posted by: Josh | December 23, 2006 5:10 PM

  • This is good example of success of hakia
    why dont people tell their salaries?Posted by: Anonymous | January 3, 2007 2:14 AM

  • The main for Hakia is that Google is not standing still. G has a secret project which I feel must be to do with semantics.BTW – Google does not use any knowledge of semantics for translation. We have from Google.

    El barco attravesta una cerradua – un vuelo de cerraduras – La estacion de ressorte – jogar de puente

    The last is particular annoying. My daughter plays for England and I when I try to search for “Bridge” I am overwhemed with sites on civil engineering.

    I specifically tested these.
    with Hakia

    The locks on the Grand Union Canal
    Spring flowers (primavera) Springs in Gloustershire (mamanthal)
    Bridge tournaments

    The results on the whole were satisfactory – much better than Google. Understand is a difficult word to define. My definition (bueno espagnol) is the difference between Primavera, Ressorte, Mamanthal. In other words can we use our “understanding” in an operational way. My view is that precise definition + a large enough database = Turing. To some extent Hakia appears to do this. It must be the future. The fly in the oitment is what Google is doing.

    Posted by: Ian Parker | January 6, 2007 5:27 AM

Read Full Post »

Spock – Vertical Search Done Right

Written by Alex Iskold / June 26, 2007 6:10 AM / 11 Comments


There has been quite a lot of buzz lately around a vertical search engine for people, called Spock. While still in private beta, the engine has already impressed users with its rich feature set and social aspects. Yet, there is something that has gone almost unnoticed – Spock is one of the best vertical semantic search engines built so far. There are four things that makes their approach special:

  • The person-centric perspective of a query
  • Rich set of attributes that characterize people (geography, birthday, occupation, etc.)
  • Usage of tags as links or relationships between people
  • Self-correcting mechanism via user feedback loop

Spock’s focus on people

The only kind of search result that you get from Spock is a list of people; and it interprets any query as if it is about people. So whether you search for democrats or ruby on rails or new york, the results will be lists of people associated with the query. In that sense, the algorithm is probably a flavor of the page rank or frequency analysis algorithm used by Google – but tailored to people.

Rich semantics, tags and relationships

As a vertical engine, Spock knows important attributes that people have. Even in the beta stage, the set is quite rich: name, gender, age, occupation and location just to name a few. Perhaps the most interesting aspect of Spock is its usage of tags. Firstly, all frequent phrases that Spock extracts via its crawler become tags. In addition, users can also add tags. So Spock leverages a combination of automated tags and people power for tagging.

A special kind of tag in Spock is called ‘relationships’ – and it’s the secret sauce that glues people together. For example, Chelsea is related to Clinton because she is his daughter, but Bush is related to Clinton because he is the successor to the title of President. The key thing here is that relationships are explicit in Spock. These relationships taken together weave a complex web of connections between people that is completely realistic. Spock gives us a glimpse of how semantics emerge out of the simple mechanism of tagging.

Feedback loops

The voting aspect of Spock also harnesses the power of automation and people. It is a simple, yet very interesting way to get feedback into the system. Spock is experimenting with letting people vote on the existing “facts” (tags/relationships) and it re-arranges information to reflect the votes. To be fair, the system is not yet tuned to do this correctly all the time – it’s hard to know right from wrong. However, it is clear that a flavor of this approach in the near future will ‘teach’ computers what the right answer is.

Limitations of Spock’s approach

The techniques that we’ve discussed are very impressive, but they have limitations. The main problem is that Spock is likely to have much more complete information about celebrities and well known people than about ordinary people. The reason for it is the amount of data. More people are going to be tagging and voting on the president of the United States than on ordinary people. Unless of course, Spock breaks out and becomes so viral that a lot of local communities form – much like on Facebook. While it’s possible, at this point it does not seem to likely. But even if Spock just becomes a search engine that works best for famous people, it is still very useful and powerful.

Conclusion

Spock is fascinating because of its focus and leverage of semantics. Using tags as relationships and the feedback loop strike me as having great potential to grow a learning system organically, in the matter that learning systems evolve in nature. Most importantly, it is pragmatic and instantly useful.

Leave a comment or trackback on ReadWriteWeb and be in to win a $30 Amazon voucher – courtesy of our competition sponsors AdaptiveBlue and their Netflix Queue Widget.

2 TrackBacks

Listed below are links to blogs that reference this entry: Spock – Vertical Search Done Right.TrackBack URL for this entry: http://www.readwriteweb.com/cgi-bin/mt/mt-tb.cgi/2309
» Weekly Wrapup, 25-29 June 2007 from Read/WriteWeb
The Weekly Wrapups have been a feature of Read/WriteWeb since the beginning of January 2005 (when they were called Web 2.0 Weekly Wrapups). Nowadays the Wrapup is designed for those of you who can’t keep up with a daily dose… Read More
» The Web’s Top Takeover Targets from Read/WriteWeb
This past year has been a very eventful one in the M&A arena, with many of web 2.0’s biggest names being snapped up. A few stand-outs include the likes of YouTube, Photobucket, Feedburner, Last.fm, and StumbleUpon. Yet, there still remains… Read More

Comments

Subscribe to comments for this post OR Subscribe to comments for all Read/WriteWeb posts

  • Spock will also create a huge database of “ordinary” people, too.They’re aggregating Facebook, MySpace and LinkedIn. They have less known people, too. I was known to the system – there was not much detail, but it included my name, age, country and MySpace-profile.

    If they start to index more resources, like domains (who owns which domains), blogs (there are millions of them…), more social networks or best: the web in general, they’re on the best way to actually become a search engine for _everybody_.

    Also, don’t underestimate the fact that everybody will at least tag himself. That’s our ego! 🙂

    Posted by: Sebastian | June 26, 2007 6:46 AM

  • I agree that there’s huge potential for Spock, and that it is very well done.Potential downside? If Spock does hit I can envision employers and recruiters making extensive use of it to check up on/get background on employees/prospects – which might not be such a good thing for some.

    Posted by: Chris | June 26, 2007 7:26 AM

  • spock is gonna take alot of money to market that domain. the name is terrible. spook is better. spoke is better. you would think they would at least common sense vertical web address like mylocator.com or something. the world does not need another website that you have to explain what it does. vertical done right needs no explanation to location. change the name. I like spoke better.Posted by: steven emery | June 26, 2007 9:11 AM

  • What the fuk is this?!?
    semenatic who? Dont they make antivirus ?
    Why would they want to do search engine.they cant tell me who stole my screwdriver but I know it was claxton before he left that POS.

    Posted by: Mike Hulubinka | June 26, 2007 10:50 AM

  • Pretty interesting technology. One of the default queries behind the log-in is “people killed by handguns.” I think the feedback loop feature is a great quality control mechanism, assuming it’s not terribly prone to abuse; it’s also a lot of fun to play with!I think I still have a couple invitations if anyone is interested in trying it out.

    Posted by: Cortland Coleman | June 26, 2007 8:23 PM

  • I am not excited by spock because its business objective is meaningless. it is a good tool to kill time. however, google is a great tool to save time.Posted by: keanu | June 26, 2007 8:59 PM

  • Well, I would like to make an interesting comment, but when I went to their site it was down for maintenance.A portent?

    Posted by: Alan Marks | June 27, 2007 6:15 AM

  • I had the same experience as Alan but now Spock’s back up it appears that it’s invitation only. As current users are able to invite others, it would be great if some generous person could send me an invitation! jason (at) talktoshiba.comPosted by: Jason | June 28, 2007 2:20 AM

  • hai all spockerPosted by: rmpal | July 3, 2007 5:22 AM

  • If you want free spock invites go to http://www.swapinvites.com/Posted by: Nathan | July 11, 2007 10:55 AM

  • Crawling the web does not always lead to good results…search on spock.com for “Christian” and just wonder about the results…Posted by: wayne | August 14, 2007 3:19 AM

Read Full Post »

Thoughts on Google Sites, IT department threat?

Google released Sites today, a centralized repository for sharing information and collaborative workspaces. There’s been much commentary around the blogosphere on whether Sites is a threat to Microsoft’s SharePoint product, MiramarMike gives an excellent comparison over here.

There’s been so much discussion post launch that I wasn’t going to comment. However a post over on RWW by Sarah Perez gave me some motivation to comment on her perspective. Sarah’s perspective is that, while Sites might be a reasonable product for Ma and Pa operations, it’s not suitable for enterprise, and in fact Google’s strategy for encouraging enterprise users to adopt their product is counter productive and somewhat sneaky.

Sarah quotes Dave Girouard, who runs Google’s enterprise unit, as saying this about what his company is doing: “We’re wrestling over who should have ultimate authority of the technology people use in the workplace. There’s no right or wrong answer so we have to respect everyone’s view.”

She then goes on to draw a stark conclusion from Dave’s words, saying:

Let’s read between the lines of that last statement…Google doesn’t think IT should have the ultimate authority about the tools people use to do their jobs. There’s “power to the people,” and then there’s a total coup-d’etat. Google’s opting for the latter.

I have to say I can’t agree with Sarah, Google is clearly empowering operational level employees within an enterprise. In the event that their IT department hasn’t the funding (although given the fact that GApps is free this is a non starter anyway) or the time resource, operational and team level personnel can deploy the broader GoogleApps products to make the most of their collaboration potential. The way I see it, if IT departments were doing their jobs (and some are) there would be no need to be having this discussion. They would be sufficiently user-centric to decide on the best product for their users needs, be it MS, Google or anything else.

In all this discussion around circumventing, or not, corporate IT departments, people seem to have lost sight of the real issue here. Corporate IT’s role is to assess and implement solutions that provide the functionality to the users that those suers require. It isn’t to build empires or create silos. Any success Google has within an enterprise setting (and I’m not going to wade into the argument about whether or not Google apps is having enterprise level success) would seem to be to be a comment on the efficacy of the IT department itself.

For too long CIOs have been technology centric on the one hand and compliance driven on the other. Between cuddling up to the big software vendors and spending time worried about the skins with regards Sarbanes Oxley compliance, they’ve lost site of the fact that their existing offering to the business are lacking.

Rather than finding ways to block their users making individualised and decentralised decisions, they should be partnering with the business units to truly asses their requirements and the best solutions to fulfil their needs.

Sarah quotes Joel Hruska of Ars Technica as saying “…IT administrators tend to fervently dislike the sudden appearance of unapproved applications, even if said software package promises world peace, actually delivers all those free iPods, and periodically spits gold doubloons out of the CD-ROM drive. Google’s approach seems predicated on the old adage that it’s always easier to get forgiveness than permission. One the one hand, Google Apps Team Edition could help facilitate group-level communication on projects, but the program could also engender a significant backlash from IT managers who aren’t at all thrilled at its sudden appearance. This is particularly true of companies with strict(er) IT policies, or companies already in the middle of deploying an alternative work collaboration system…Google seems to be betting that if it can build enough grassroots support for Google Apps, IT departments and corporations will have no choice but to embrace it as a provider. Such an approach may work beautifully in the consumer market, but there’s no guarantee corporations will be as flexible.”

And then decides that:

If anything, this strategy will drive enterprise IT even further from Google Apps, keeping the Apps program the sole province of the SOHO and small-medium business market.

And herein lies the rub, if enterprise IT continues to be prescriptive and protective of incumbents, it will eventually start to erode the value of the organisation as younger, leaner, more agile and proactive organisations utilise whichever tools satisfy their needs.

4 Responses to “Thoughts on Google Sites, IT department threat?”


  1. 1 Jim Donovan
    Ben – that is too simplistic and naive on so many levels. Yes IT depts may have challenges in responsiveness, but they have first responsibility on business IT integrity for many very sound reasons. The frequent shambles caused by Excel and Access user developments are legion, as were the skunk works super-micro apps of the 80s. Web-apps are no different.
  2. 2 Ben Kepes
    Jim – I accept that my post was a little heavy handed. However I have to say that it appears there is a real degree of empire building and turf protection going on within some corporate IT departments. Granted there are requirements on CIOs to ensure corporate safety, but I contend that many of their decisions are made for the wrong reasons.

    What I’m saying is that IT departments need to be involved and embrace new offerings (within parameters). I liken it to a CEO directive of “let’s be as creative as we can on how we support the day-to-day business activities” … They also need to know when a bottom up solution is acceptable, appropriate and most applicable.

  1. 1 Thoughts on Google Sites, IT department threat? « The “Meta” Internet: The genesis of a “virtual” Silicon Valleys leveraging the power of the Internet.
  2. 2 Microsoft to boost SaaS credibility? at diversity.net.nz

Leave a Reply

Read Full Post »

Older Posts »

%d bloggers like this: