By Jacqui Cheng | Published: October 08, 2007 – 02:03PM CT
Both Google and IBM plan to provide several hundred computers as part of the initiative, which will be a combination of IBM BladeCenter and System x servers and Google machines. The servers will run a variety of open-source software, which students will be able to access through the Internet to test parallel programming projects. Additionally, the companies—in conjunction with the University of Washington—have made available a Creative Commons-licensed university curriculum to focus on parallel computing, and IBM has developed Eclipse-compatible open -source software that will aid students in developing programs for clusters running with Hadoop.
Currently, only a select group of universities are piloting the program. That list includes the University of Washington, Carnegie-Mellon University, Massachusetts Institute of Technology, Stanford University, University of California at Berkeley, and University of Maryland. The companies hope to expand the program in the future and grow the cluster to over 1,600 processors. As an example of one of the projects that has already been performed on the cluster, Google says that University of Washington students were able to use the cluster to scan the millions of edits made to Wikipedia in order to identify spam and organize news by geographic location.
The idea for the program came from Google senior software engineer Christophe Bisciglia, who said that while interviewing close to a hundred college students during his time at Google, he had noticed a consistent pattern. The pattern was that, despite the extreme talent of these potential job candidates, they “sort of stalled” when asked to think about algorithms the way that Google does. “They just didn’t have the background to think about computation in a fundamentally distributed manner,” Bisciglia told Ars.
Biscliglia then began working with his alma mater, the University of Washington, to develop the curriculum in his 20-percent time (the paid time that Google allows employees to work on their own projects) to better prepare students for a changing industry.
Bisciglia said that the ultimate purpose of the program is to “start closing the gap between industry and academia,” and that there’s a need to “break the single-server mindset.” With such an explosion of content and users on the Internet, he said, no one machine is going to be powerful to meet any company’s needs, but students haven’t had much of an opportunity until now to explore parallel computing before being dumped into the real world. “Our goal is that, once we shake out the bugs and understand the needs of the schools and communities, we can bring on more schools as we learn more and make more resources available,” he told Ars.
Of course, the cloud computing initiative isn’t designed just to offer resources to students—Google and IBM have a vested interest in making sure that students at these top universities keep coming to their companies after graduating. Google doesn’t make much of an effort to hide this fact, either: “In order to most effectively serve the long-term interests of our users, it is imperative that students are adequately equipped to harness the potential of modern computing systems and for researchers to be able to innovate ways to address emerging problems,” said Google CEO Eric Schmidt in a statement.
The pairing may seem odd upon first blush, but both Google and IBM recognize that they bring two sets of expertise to the table that can make the project succeed. IBM’s experience in running data centers, combined with Google’s obvious experience in running web apps on giant clusters, complement each other.