Google Revealed: The IT Strategy That Makes It Work
A unique mix of internally developed software, open source, made-to-order hardware, and people management is the secret behind the search engine.
By Thomas Claburn, InformationWeek
<!– –>Aug. 28, 2006
In Building 43 at Google’s Mountain View, Calif., headquarters is a video screen that depicts the world as seen in Google Earth. Across a revolving globe, streams of colorful pixels, like sparks from a Roman candle, mark the geographic origin of queries coming in to Google’s search engine. It’s a real-time representation of Google as the nexus of human curiosity.
Google is different. And it’s different not only because its thinking is original and its applications unique–witness search queries morphed into a lobby display of bursting color–but because the company’s unconventional IT strategy makes it so. Commodity hardware and free software hardly seem like the seeds of an empire, yet Google has turned them into an unmatched distributed computing platform that supports its wildly popular search engine, plus a burgeoning number of applications. We used to call them consumer applications, but Google changed that. Businesses also use them because, well, Google is different.
The IT infrastructure behind Google’s Web services doesn’t matter much to the millions of people conducting searches, but it’s everything to the hundreds of engineers dedicated to Google’s mission of organizing the world’s information and making it “universally accessible and useful.” That calls for an IT plan that matches the company’s business vision in scope and ambition.
Choice is always better than control, Merrill says.
Photo by Jeffery Newbury
Google managers tend to be reticent on the subject of IT strategy, they’re loath to talk about specific vendors or products, and they clam up when asked about their servers and data centers. But a day spent with some of the company’s IT leaders reveals there’s more to Google’s IT operations than a search engine running on a massive server farm. Behind the seeming simplicity is a mash-up of internally developed software, made-to-order hardware, artificial intelligence, obsession with performance, and an unorthodox approach to people management.
There’s a lesson in Google’s IT philosophy for other companies: Shun the herding instinct that leads toward the same systems and software everyone else is using. There may well be competitive advantages in doing things your own way.
“Culture drives the way you do things,” says Douglas Merrill, VP of engineering and Google’s de facto CIO. “To the extent, like us, your organizational culture is unusual in important ways, you will have to build different ways of running your traditional systems.”
Google’s great IT advantage is its ability to build high-performance systems that are cost efficient (we didn’t say cheap) and that scale to massive workloads. Because of that, IT consultant Stephen Arnold argues, Google enjoys huge cost advantages over competitors such as Amazon, eBay, Microsoft, and Yahoo. Google’s programmers are 50% to 100% more productive than their peers at other Web companies, a result of the custom libraries Google developed to support programming of massively parallel systems, Arnold says. He estimates the company’s competitors have to spend four times as much to keep up.
Pimp My ServerHow does Google do it? For one thing, Merrill says, “we build hardware.” Google doesn’t manufacture computer systems, but it does order them to its own specifications, then installs and tunes them like something out of MTV’s Pimp My Ride. “What it comes down to is we’re very good at buying commodity servers and using them to their fullest, to the point where they’re almost so damn hot they’ll melt,” open source program manager Chris DiBona says.
That hands-on approach, born of the frugality of a garage startup, persists because Google’s scale demands it. Google has between 200,000 and 450,000 servers spread among up to 65 data centers, depending on how you define them and who’s doing the counting. And those numbers continue to rise.
The company won’t discuss these estimates; it considers such numbers to be a competitive advantage. In fact, one of the things Google likes about open source software is that it facilitates secrecy. “If we had to go and buy software licenses, or code licenses, based on seats, people would absolutely know what the Google infrastructure looks like,” DiBona says. “The use of open source software, that’s one more way we can control our destiny.”
Scale works in Google’s favor. The marginal advantage of custom-built servers becomes significant when multiplied by hundreds of thousands of machines. The company is constructing a 30-acre data center along the Columbia River in The Dalles, Ore., where it can get low-priced hydroelectric power for computing and cooling (see story, “Google Goes Its Own Way In The Data Center“).
Open source software lets Google control its own destiny, DiBona says.
Google organizes its machines, which run Linux, into “cells,” which DiBona describes as a kind of disk drive for Internet services. (Not to be confused with Gdrive, the long-rumored Google hosted storage service. “There is no Gdrive,” a spokeswoman insists.) Software programs reside on racks of inexpensive computers, and programmers decide how much redundancy to give them. The cells take the place of commercial storage equipment; DiBona says Google’s cells are cheaper to create and maintain, and he hints they can handle more data, too.
No level of minutiae escapes Google’s attention. For years, the company’s engineers have studied the inner workings of microprocessors, and as Google continues to scale up, chips tuned to its unique needs could become a necessity. In a paper published in an industry journal last year, distinguished engineer Luiz Barroso said key workloads at Google suffered from single-core designs in recent years. Many server-side apps, such as serving the Google search index, don’t process in parallel well at the instruction level on such chips.
The arrival of more chip-level parallelism as Advanced Micro Devices, Intel, and Sun Microsystems build multiple cores onto their chips has been a boon, says Barroso, a former chip designer at Digital Equipment and Compaq.
Google has even considered designing its own computer chips, but such a bold move may be unnecessary given industry trends. “Designing a microprocessor is a complex and costly task,” says Urs Holzle, senior VP of operations. Google prefers to work with chip manufacturers to make sure they understand its applications and design chips that are a good fit. It’s been advocating designs that focus on aggregate throughput and performance per watt rather than single-thread peak performance. “Recent trends in multicore CPUs very much follow that direction,” says Holzle.
Custom TailoredTo wring every ounce of performance from its hardware, Google writes custom software–lots of it. Major innovations include MapReduce, a programming model to simplify processing and create large data sets; BigTable, a system for storing and managing massive amounts of data; Sawzall, an interpreted programming language for analyzing large data sets in a distributed computing environment; Google File System, a distributed file system for data-intensive applications; and Google Workqueue, a system that groups queries and schedules them for distributed processing.
It’s in tools like Sawzall that Google’s obsessive focus on computational efficiency becomes clear. Not every company tackles productivity at such a fundamental level, but for Google it made sense to develop a programming language specifically to deal with data sets that are too large to fit in a conventional relational database. Even though other programming tools could be used to deal with the problem, Google engineers developed a custom solution for the sake of efficiency. Google engineers contend Sawzall programs are a fraction the size of equivalent MapReduce programs in C++ and significantly easier to write.
Such concerns explain why Google isn’t content with the standard Linux kernel; it runs a modified kernel tuned to its needs. By tinkering with the low-level behavior of Linux, Google engineers have solved data corruption and bottleneck problems, while increasing overall system reliability. The kernel alterations also made Google’s computer clusters faster by making them communicate more efficiently. Of course, Google experiences the occasional system glitch, and, when it does, millions of users can be affected. Three years ago, a system failure hobbled 20% of search traffic for 30 minutes.
Google created its own Web server instead of using the open source Apache Web server, which underpins more than 60% of Web sites. Google’s Web server can run on more machines and balance workloads among servers more effectively than Apache for Google’s large code base, which contains lots of dependencies among programs, DiBona says. The company’s approach to software like the Common Gateway Interface standard for linking databases to dynamic Web pages may be harder to use than with Apache, but it runs faster. “If we can eke out 10% to 20% better performance, we can save a lot of power, AC, and people,” DiBona says.
Google built its own CRM system to support its business of selling Internet ads billed by a mixture of bid price and click-throughs. But Google isn’t dogmatic about building its own tools. For accounting, it uses Oracle Financials.
Sometimes, value can be bought off the shelf, Merrill says, holding up a plastic fork as an example. But there are times when off-the-shelf software won’t do. “Our culture is pretty deeply embedded in many of our processes,” he says. “So what we don’t want to do is buy a tool, which by extension changes the cultural aspects of the way we do things.”
Keep It InterestingGoogle doesn’t disclose how much it spends on IT. Susquehanna Financial Group estimates Google invested about $300 million in IT in the first half of this year, or about 30% of the company’s overall capital expenditures during that period, according to analyst Marianne Wolk. For the past few years, she says, Google was spending roughly 50% of capital expenditures on IT, but that percentage has diminished as the company increases spending in other areas, including land, as it expands.
Google’s unorthodox approach to managing its Ph.D.s drove its decision not to budget research and development separately, as most tech companies do. “You end up in many companies with this divide between research and engineering,” explains Alan Eustace, senior VP of engineering and research. By dividing those budgets, he says, “you’re pretty much guaranteeing institutionally that you won’t be solving interesting problems.”
IT management at Google is decentralized. The company has neither a CIO nor CTO, but it’s brimming with senior-level engineers and other technologists. They include Bill Coughran, VP of engineering for systems infrastructure, who oversees the distributing computing programs that power Google’s online applications, and Eustace, who’s responsible for product R&D. Sergey Brin isn’t just Google’s co-founder–his day-to-day job is president of technology. Merrill, brought in as senior director of IS three years ago, now is responsible for internal engineering and worldwide support.
Google employs a matrix management system where managers have many direct reports, and engineers report to multiple people. Engineers get most of their direction and critiques from the project leaders with whom they work. Since engineers can change projects every three months, Google eschews traditional approaches to project management and performance appraisals. Like other things at Google, artificial intelligence and computer automation perform some of the grunt work. “Our goal is to automate as many things as we can because it makes unfun things not happen,” Merrill says. “Nobody wants to have a boring job, right?”
A tracking system automatically pulls information on job applicants, gives a hiring manager a job candidate’s resumé, offers questions to ask, and sends the manager an e-mail after the interview asking what he or she thought of the candidate. Job interviews can involve logic questions, writing code, talking about software architecture, and generally proving to Google’s brain trust that the applicant is a fast learner, since the company doesn’t keep people working on the same problems for very long.
Lots of small, short-lived projects mean traditional project management software based on task lists isn’t right for Google. For one thing, techies aren’t very good at cataloging how they spend their hours. What they are good at, it turns out, is writing up a few short sentences or snippets about what they do each day. Those get compiled in a database along with periodic updates from project leaders about a team’s deliverables. The project system tags the input by topic and routes to the appropriate people. “This is not hard AI,” Merrill says. Still, who else manages workers like this?
Performance reviews are handled in a similarly technocratic way. Google’s “Perf” system lets managers write e-mails–again read by a computer before any human–describing what a worker did on a project that was good or bad. Come review time, peers get an e-mail asking to compare the employee to other Google people. Perf breaks up the answers, measures who’s being compared with whom, and–get this–makes the answers public. The way Merrill figures it, techies like open-air back patting. Presumably, the process airs some dirty laundry, too, but Merrill says that would happen anyway. “We have to protect our culture as we’re growing fast,” he says. “That’s what keeps us up nights.”
Google’s approach isn’t without its detractors. One marketing person who came to Google in 2004 as part of an acquisition grew frustrated by lack of resources and support, and quit. “I think Google is a great place to be from an IT engineer standpoint,” the former employee says. “I don’t know that it’s quite as good as people think from the business and marketing side.”
The company has a ways to go before its marketing savvy matches its engineering. Apart from its search engine and advertising system, Google’s wide variety of online applications have seen only modest adoption (see story, “In Depth: Google Aims At Microsoft Office’s Weak Spot With Desktop Suite“). Its Gmail service has yet to seriously challenge long-established free e-mail services from Yahoo and Microsoft. The same is true for its online financial portal, Google Finance. And Google Maps remains a distant third to MapQuest and Yahoo Maps.
Culture Of Choice
Google employees use Linux, Mac OS, and Windows on desktop computers, depending on their needs and desires. Many use homegrown programs such as Google Desktop, Google Earth, the acquired Writely word processor, and the recently launched Google Spreadsheets. In general, if an employee wants certain software, he or she can request it through the company intranet without jumping through a lot of hoops for approval.
Merrill is evasive when asked what kinds of commercial PC software are used at Google. “More important than what we put on each desktop is how we think about what to put on each desktop,” he says obliquely. “Goo-gle’s philosophy is that choice is always better than control. Tightly centralized control gets in the way of innovation.”
He then takes a jab at CIOs–which he describes as a title used by “old-world companies”–at other companies. “Most people in my job try to control. ‘Here are the three things you can buy.'” Merrill explains. “I try to control as a little as I possibly can but make it easy to work within parameters that I know how to work with.”
Merrill sees a distinction between tools that tell you something and tools that stop you from doing something. For example, he observes that some financial services institutions block instant messaging because of they way they interpret regulations. “We don’t think that’s the right approach here,” he says.
The right approach, as Merrill sees it: Talk a lot; use data, not intuition; automate wherever you can.
Collective InsightVP of engineering Adam Bosworth last year wrote that Google’s success in making a more relevant search was based on “leveraging the wisdom of crowds,” referring to the company’s PageRank algorithm. (James Surowiecki’s book, The Wisdom Of The Crowds, was published in 2004 by Random House.) Company founders Larry Page and Sergey Brin built the business on PageRank, which analyzes the human-generated link structure of the Web to determine the relative importance of a Web page. As PageRank sees it, the more people link to a given page, the more important that page is likely to be.
This turns out to be the perfect division of labor between man and machine: Evaluating content is easy for people, and analyzing large data sets is easy for computers. By marrying collective intelligence with automation, Page and Brin built a company fueled by artificial intelligence. “AI is a great tool for helping people make better decisions,” Merrill says. “It’s not so good at making complex decisions.”
The wisdom of the crowd, farmed and refined by machine, remains critical to Google. As Merrill puts it, “All of us together are smarter than any of us individually.” That insight may not be as surprising now that it has been reinforced by the likes of Wikipedia and Digg.com, but it’s still mostly lip service at many other companies.
At Google, however, examples abound, such as the way the company decides on new employees. “No one can hire anyone here,” Merrill insists. “Hiring decisions are made by public groups. We all hire everyone.”
That faith in group intelligence manifests itself in the lunch line. Google provides free meals to employees partly as a perk and to enhance productivity, but also to encourage interaction. It’s about the pollination of ideas over salads and sandwiches. “If you want people to talk, if you want people to engage, how do you do that?” Merrill asks. “You give them lunch.”
Google has an expression to describe this open discourse: Live out loud. “Everything that’s done privately is done publicly here,” he says. (As if to make the point, Merrill took off his T-shirt during our photo shoot, showing off his tattoos.) “We make decisions in public. We expect people to debate. You’re supposed to engage. You’re supposed to disagree.”
There are, of course, limits. Merrill concedes that some things need to remain private. “Customer data privacy is obviously critical to us, so that stuff is protected a lot,” he says. “But our belief is that some of the things that are private in normal businesses aren’t really private.”
Normal businesses? That would be 99% of other companies. The challenge for Google is to remain different–which is part of its competitive advantage–while staying true to its mission to organize the world’s information and make it universally, rather than selectively, accessible and useful. Because you can be sure that other companies are watching and learning from Google’s every move.
— with Aaron Ricadela and Charles Babcock
Copyright © 2007 CMP Media LLC