Posts Tagged ‘distributed computing’

Wikia acquires Grub distributed search indexing system

By Ryan Paul | Published: July 30, 2007 – 08:17AM CT

Wikia, the company created by Wikipedia founder Jimmy Wales, has acquired the Grub distributed indexing system from LookSmart and is preparing to distribute Grub’s code under an open-source license. Wikia plans to use Grub for its user-driven search engine, which is still under development.

Originally created in 2000, Grub leverages the distributed computing model to crawl the web and index pages. Users install a specialized client application on their computer, which then automatically performs indexing while idle and transmits page data back to a centralized repository. In this manner, volunteers will contribute the raw computing power that performs the indexing.

Wikia is resurrecting Grub as an open source project and hopes to work with the open source software community to create ports of the Grub client—which currently only runs on Windows—to other operating systems. Wikia hopes that the modular nature of Grub and the availability of source code will make it possible for users to add features and help improve the system’s performance.

In addition to leveraging volunteer computing power for automated indexing, Wikia’s search engine will also attempt to take advantage of human power for index editing and refinement. According to Wales, users of the Wikia search engine will be involved in adding and removing links, removing spam, and policing other users much like the participatory model used by Wikipedia today.

“The desire to collaborate and support a transparent and open platform for search is clearly deeply exciting to both open source and businesses,” said Wales in a statement. “Look for other exciting announcements in the coming months as we collectively work to free the judgment of information from invisible rules inside an algorithmic black box.”

Wikia’s search engine isn’t yet available for use, but the project’s mission is articulated on the Wikia search page. With goals like transparency, community, quality, privacy, and interoperability, Wikia’s search service seems promising at first glance, but despite the potential value, there are many problems that the company will face when the search engine launches.

Search engine ranking has significant financial implications for many companies, so it’s likely that Wikia’s user-driven search engine will face constant attempts at manipulation. Keeping the spammers and search engine optimization hackers at bay is sure to be a taxing endeavor. Considering the vehemence with which Wikipedia users have traditionally opposed using ads rather than donations to fund Wikipedia, it’s not entirely clear that an ad-based commercial project like Wikia’s search engine will attract the same degree of user involvement.

Distributed computing is a highly unusual approach to indexing, but it’s also consistent with Wikia’s participatory model. Regardless of whether or not Wikia’s search engine succeeds, the company’s willingness to experiment with unconventional approaches could spur innovation and change the landscape of the search engine market.

Discuss Print

Read Full Post »

Google and IBM team on cloud computing initiative for universities

By Jacqui Cheng | Published: October 08, 2007 – 02:03PM CT

Google and IBM announced today that the two companies have partnered to offer millions of dollars in resources to universities in order to promote cloud computing projects. The companies say that the goal is to improve students’ knowledge of parallel computing practices and better prepare them for increasingly popular large-scale computing that takes place in the “real world,” such as search engines, social networking sites, and scientific computational needs.

Both Google and IBM plan to provide several hundred computers as part of the initiative, which will be a combination of IBM BladeCenter and System x servers and Google machines. The servers will run a variety of open-source software, which students will be able to access through the Internet to test parallel programming projects. Additionally, the companies—in conjunction with the University of Washington—have made available a Creative Commons-licensed university curriculum to focus on parallel computing, and IBM has developed Eclipse-compatible open -source software that will aid students in developing programs for clusters running with Hadoop.

Currently, only a select group of universities are piloting the program. That list includes the University of Washington, Carnegie-Mellon University, Massachusetts Institute of Technology, Stanford University, University of California at Berkeley, and University of Maryland. The companies hope to expand the program in the future and grow the cluster to over 1,600 processors. As an example of one of the projects that has already been performed on the cluster, Google says that University of Washington students were able to use the cluster to scan the millions of edits made to Wikipedia in order to identify spam and organize news by geographic location.

The idea for the program came from Google senior software engineer Christophe Bisciglia, who said that while interviewing close to a hundred college students during his time at Google, he had noticed a consistent pattern. The pattern was that, despite the extreme talent of these potential job candidates, they “sort of stalled” when asked to think about algorithms the way that Google does. “They just didn’t have the background to think about computation in a fundamentally distributed manner,” Bisciglia told Ars.

Biscliglia then began working with his alma mater, the University of Washington, to develop the curriculum in his 20-percent time (the paid time that Google allows employees to work on their own projects) to better prepare students for a changing industry.

Bisciglia said that the ultimate purpose of the program is to “start closing the gap between industry and academia,” and that there’s a need to “break the single-server mindset.” With such an explosion of content and users on the Internet, he said, no one machine is going to be powerful to meet any company’s needs, but students haven’t had much of an opportunity until now to explore parallel computing before being dumped into the real world. “Our goal is that, once we shake out the bugs and understand the needs of the schools and communities, we can bring on more schools as we learn more and make more resources available,” he told Ars.

Of course, the cloud computing initiative isn’t designed just to offer resources to students—Google and IBM have a vested interest in making sure that students at these top universities keep coming to their companies after graduating. Google doesn’t make much of an effort to hide this fact, either: “In order to most effectively serve the long-term interests of our users, it is imperative that students are adequately equipped to harness the potential of modern computing systems and for researchers to be able to innovate ways to address emerging problems,” said Google CEO Eric Schmidt in a statement.

The pairing may seem odd upon first blush, but both Google and IBM recognize that they bring two sets of expertise to the table that can make the project succeed. IBM’s experience in running data centers, combined with Google’s obvious experience in running web apps on giant clusters, complement each other.

Discuss Print

Read Full Post »

%d bloggers like this: