Welcome to a Scientific American experiment in “networked journalism,” in which readers—you—get to collaborate with the author to give a story its final form.
The article, below, is a particularly apt candidate for such an experiment: it’s my feature story on “Science 2.0,” which describes how researchers are beginning to harness wikis, blogs and other Web 2.0 technologies as a potentially transformative way of doing science. The draft article appears here, several months in advance of its print publication, and we are inviting you to comment on it. Your inputs will influence the article’s content, reporting, perhaps even its point of view.
So consider yourself invited. Please share your thoughts about the promise and peril of Science 2.0.—just post your inputs in the Comment section below. To help get you started, here are some questions to mull over:
- What do you think of the article itself? Are there errors? Oversimplifications? Gaps?
- What do you think of the notion of “Science 2.0?” Will Web 2.0 tools really make science much more productive? Will wikis, blogs and the like be transformative, or will they be just a minor convenience?
- Science 2.0 is one aspect of a broader Open Science movement, which also includes Open-Access scientific publishing and Open Data practices. How do you think this bigger movement will evolve?
- Looking at your own scientific field, how real is the suspicion and mistrust mentioned in the article? How much do you and your colleagues worry about getting “scooped”? Do you have first-hand knowledge of a case in which that has actually happened?
- When young scientists speak out on an open blog or wiki, do they risk hurting their careers?
- Is “open notebook” science always a good idea? Are there certain aspects of a project that researchers should keep quite, at least until the paper is published?
–M. Mitchell Waldrop
The explosively growing World Wide Web has rapidly transformed retailing, publishing, personal communication and much more. Innovations such as e-commerce, blogging, downloading and open-source software have forced old-line institutions to adopt whole new ways of thinking, working and doing business.
Science could be next. A small but growing number of researchers–and not just the younger ones–have begun to carry out their work via the wide-open blogs, wikis and social networks of Web 2.0. And although their efforts are still too scattered to be called a movement–yet–their experiences to date suggest that this kind of Web-based “Science 2.0″ is not only more collegial than the traditional variety, but considerably more productive.
“Science happens not just because of people doing experiments, but because they’re discussing those experiments,” explains Christopher Surridge, editor of the Web-based journal, Public Library of Science On-Line Edition (PLoS ONE). Critiquing, suggesting, sharing ideas and data–communication is the heart of science, the most powerful tool ever invented for correcting mistakes, building on colleagues’ work and creating new knowledge. And not just communication in peer-reviewed papers; as important as those papers are, says Surridge, who publishes a lot of them, “they’re effectively just snapshots of what the authors have done and thought at this moment in time. They are not collaborative beyond that, except for rudimentary mechanisms such as citations and letters to the editor.”
The technologies of Web 2.0 open up a much richer dialog, says Bill Hooker, a postdoctoral cancer researcher at the Shriners Hospital for Children in Portland, Ore., and the author of a three-part survey of open-science efforts in the group blog, 3 Quarks Daily. “To me, opening up my lab notebook means giving people a window into what I’m doing every day. That’s an immense leap forward in clarity. In a paper, I can see what you’ve done. But I don’t know how many things you tried that didn’t work. It’s those little details that become clear with open notebook, but are obscured by every other communication mechanism we have. It makes science more efficient.” That jump in efficiency, in turn, could have huge payoffs for society, in everything from faster drug development to greater national competitiveness.
Of course, many scientists remain highly skeptical of such openness–especially in the hyper-competitive biomedical fields, where patents, promotion and tenure can hinge on being the first to publish a new discovery. From that perspective, Science 2.0 seems dangerous: using blogs and social networks for your serious work feels like an open invitation to have your online lab notebooks vandalized–or worse, have your best ideas stolen and published by a rival.
To Science 2.0 advocates, however, that atmosphere of suspicion and mistrust is an ally. “When you do your work online, out in the open,” Hooker says, “you quickly find that you’re not competing with other scientists anymore, but cooperating with them.”
Rousing Success
In principle, says PLoS ONE’s Surridge, scientists should find the transition to Web 2.0 perfectly natural. After all, since the time of Galileo and Newton, scientists have built up their knowledge about the world by “crowd-sourcing” the contributions of many researchers and then refining that knowledge through open debate. “Web 2.0 fits so perfectly with the way science works, it’s not whether the transition will happen but how fast,” he says.
The OpenWetWare project at MIT is an early success. Launched in the spring of 2005 by graduate students working for MIT biological engineers Drew Endy and Thomas Knight, who collaborate on synthetic biology, the project was originally seen as just a better way to keep the two labs’ Web sites up to date. OpenWetWare is a wiki–a collaborative Web site that can be edited by anyone who has access to it; it even uses the same software that underlies the online encyclopedia Wikipedia. Students happily started posting pages introducing themselves and their research, without having to wait for a Webmaster to do it for them.
But then, users discovered that the wiki was also a convenient place to post what they were learning about lab techniques: manipulating and analyzing DNA, getting cell cultures to grow. “A lot of the ‘how-to’ gets passed around as lore in biology labs, and never makes it into the protocol manuals,” says Jason Kelly, a graduate student of Endy’s who now sits on the OpenWetWare steering committee. “But we didn’t have that.” Most of the students came from a background in engineering; theirs was a young lab with almost no mentors. So whenever a student or postdoc managed to stumble through a new protocol, he or she would write it all down on a wiki page before the lessons were forgotten. Others would then add whatever new tricks they had learned. This was not altruism, notes steering-committee member Reshma Shetty. “The information was actually useful to me.” But by helping herself, she adds, “that information also became available around the world.”
Indeed, Kelly points out, “Most of our new users came to us because they’d been searching Google for information on a protocol, found it posted on our site, and said ‘Hey!’ As more and more labs got on, it became pretty apparent that there were lots of other interesting things they could do.”
Classes, for example. Instead of making do with a static Web page posted by a professor, users began to create dynamically evolving class sites where they could post lab results, ask questions, discuss the answers and even write collaborative essays. “And all stayed on the site, where it made the class better for next year,” says Shetty, who has created an OpenWetWare template for creating such class sites.
Laboratory management benefited too. “I didn’t even know what a wiki was,” recalls Maureen Hoatlin of the Oregon Health & Science University in Portland, where she runs a lab studying the genetic disorder Fanconi anemia. But she did know that the frenetic pace of research in her field was making it harder to keep up with what her own team members were doing, much less Fanconi researchers elsewhere. “I was looking for a tool that would help me organize all that information,” Hoatlin says. “I wanted it to be Web-based, because I travel a lot and needed to access it from wherever I was. And I wanted something my collaborators and group members could add to dynamically, so that whatever I saw on that Web page would be the most recently updated version.”
OpenWetWare, which Hoatlin saw in the spring of 2006, fit the bill perfectly. “The transparency turned out to be very powerful,” she says. “I came to love the interaction, the fact that people in other labs could comment on what we do and vice versa. When I see how fast that is, and its power to move science forward–there is nothing like it.”
Numerous others now work through OpenWetWare to coordinate research. SyntheticBiology.org, one of the site’s most active interest groups, currently comprises six laboratories in three states, and includes postings about jobs, meetings, discussions of ethics, and much more.
In short, OpenWetWare has quickly grown into a social network catering to a wide cross-section of biologists and biological engineers. It currently encompasses laboratories on five continents, dozens of courses and interest groups, and hundreds of protocol discussions–more than 6100 Web pages edited by 3,000 registered users. A May 2007 grant from the National Science Foundation launched the OpenWetWare team on a five-year effort to transform OpenWetWare to a self-sustaining community independent of its current base at MIT. The grant will also support development of many new practical tools, such as ways to interface biological databases with the wiki, as well as creation of a generic version of OpenWetWare that can be used by other research communities such as neuroscience, as well as by individual investigators.
Skepticism Persists
For all the participants’ enthusiasm, however, this wide-open approach to science still faces intense skepticism. Even Hoatlin found the openness unnerving at first. “Now I’m converted to open wikis for everything possible,” she says. “But when I originally joined I wanted to keep everything private”–not least to keep her lab pages from getting trashed by some random hacker. She did not relax until she began to understand the system’s built-in safeguards.
First and foremost, says MIT’s Kelly, “you can’t hide behind anonymity.” By default, OpenWetWare pages are visible to anyone (although researchers have the option to make pages private.) But unlike the oft-defaced Wikipedia, the system will let users make changes only after they have registered and established that they belong to a legitimate research organization. “We’ve never yet had a case of vandalism,” Kelly says. Even if they did, the wiki automatically maintains a copy of every version of every page posted: “You could always just roll back the damage with a click of your mouse.”
Unfortunately, this kind of technical safeguard does little to address a second concern: Getting scooped and losing the credit. “That’s the first argument people bring to the table,” says Drexel University chemist Jean-Claude Bradley, who created his independent laboratory wiki, UsefulChem, in December 2005. Even if incidents are rare in reality, Bradley says, everyone has heard a story, which is enough to keep most scientists from even discussing their unpublished work too freely, much less posting it on the Internet.
However, the Web provides better protection that the traditional journal system, Bradley maintains. Every change on a wiki gets a time-stamp, he notes, “so if someone actually did try to scoop you, it would be very easy to prove your priority–and to embarrass them. I think that’s really what is going to drive open science: the fear factor. If you wait for the journals, your work won’t appear for another six to nine months. But with open science, your claim to priority is out there right away.”
Under Bradley’s radically transparent “open notebook” approach, as he calls it, everything goes online: experimental protocols, successful outcomes, failed attempts, even discussions of papers being prepared for publication. “A simple wiki makes an almost perfect lab notebook,” he declares. The time-stamps on every entry not only establish priority, but allow anyone to track the contributions of every person, even in a large collaboration.
Bradley concedes that there are sometimes legitimate reasons for researchers to think twice about being so open. If work involves patients or other human subjects, for example, privacy is obviously a concern. And if you think your work might lead to a patent, it is still not clear that the patent office will accept a wiki posting as proof of your priority. Until that is sorted out, he says, “the typical legal advice is: do not disclose your ideas before you file.”
Still, Bradley says the more open scientists are, the better. When he started UsefulChem, for example, his lab was investigating the synthesis of drugs to fight diseases such as malaria. But because search engines could index what his team was doing without needing a bunch of passwords, “we suddenly found people discovering us on Google and wanting to work together. The National Cancer Institute contacted me wanting to test our compounds as anti-tumor agents. Rajarshi Guha at Indiana University offered to help us do calculations about docking–figuring out which molecules will be reactive. And there were others. So now we’re not just one lab doing research, but a network of labs collaborating.”
Blogophobia
Although wikis are gaining, scientists have been strikingly slow to embrace one of the most popular Web 2.0 applications: Web logging, or blogging.
“It’s so antithetical to the way scientists are trained,” Duke University geneticist Huntington F. Willard said at the April 2007 North Carolina Science Blogging Conference, one of the first national gatherings devoted to this topic. The whole point of blogging is spontaneity–getting your ideas out there quickly, even at the risk of being wrong or incomplete. “But to a scientist, that’s a tough jump to make,” says Willard, head of Duke’s Institute for Genome Sciences & Policy. “When we publish things, by and large, we’ve gone through a very long process of drafting a paper and getting it peer reviewed. Every word is carefully chosen, because it’s going to stay there for all time. No one wants to read, ‘Contrary to the result of Willard and his colleagues…’.”
Still, Willard favors blogging. As a frequent author of newspaper op-ed pieces, he feels that scientists should make their voices heard in every responsible way possible. Blogging is slowly beginning to catch on; because most blogs allow outsiders to comment on the individual posts, they have proved to be a good medium for brainstorming and discussions of all kinds. Bradley’s UsefulChem blog is an example. Paul Bracher’s Chembark is another. “Chembark has morphed into the water cooler of chemistry,” says Bracher, who is pursuing his Ph.D. in that field at Harvard University. “The conversations are: What should the research agencies be funding? What is the proper way to manage a lab? What types of behavior do you admire in a boss? But instead of having five people around a single water cooler you have hundreds of people around the world.”
Of course, for many members of Bracher’s primary audience–young scientists still struggling to get tenure–those discussions can look like a minefield. A fair number of the participants use pseudonyms, out of fear that a comment might offend some professor’s sensibilities, hurting a student’s chances of getting a job later. Other potential participants never get involved because they feel that time spent with the online community is time not spent on cranking out that next publication. “The peer-reviewed paper is the cornerstone of jobs and promotion,” says PLoS ONE’s Surridge. “Scientists don’t blog because they get no credit.”
The credit-assignment problem is one of the biggest barriers to the widespread adoption of blogging or any other aspect of Science 2.0, agrees Timo Hannay, head of Web publishing at the Nature Publishing Group in London. (That group’s parent company, Macmillan, also owns Scientific American.) Once again, however, the technology itself may help. “Nobody believes that a scientist’s only contribution is from the papers he or she publishes,” Hannay says. “People understand that a good scientist also gives talks at conferences, shares ideas, takes a leadership role in the community. It’s just that publications were always the one thing you could measure. Now, however, as more of this informal communication goes on line, that will get easier to measure too.”
Collaboration the Payoff
The acceptance of any such measure would require a big change in the culture of academic science. But for Science 2.0 advocates, the real significance of Web technologies is their potential to move researchers away from an obsessive focus on priority and publication, toward the kind of openness and community that were supposed to be the hallmark of science in the first place. “I don’t see the disappearance of the formal research paper anytime soon,” Surridge says. “But I do see the growth of lots more collaborative activity building up to publication.” And afterwards as well: PLoS ONE not only allows users to annotate and comment on the papers it publishes online, but to rate the papers’ quality on a scale of 1 to 5.
Meanwhile, Hannay has been taking the Nature group into the Web 2.0 world aggressively. “Our real mission isn’t to publish journals, but to facilitate scientific communication,” he says. “We’ve recognized that the Web can completely change the way that communication happens.” Among the efforts are Nature Network, a social network designed for scientists; Connotea, a social bookmarking site patterned on the popular site del.icio.us, but optimized for the management of research references; and even an experiment in open peer review, with pre-publication manuscripts made available for public comment.
Indeed, says Bora Zivkovic, a circadian rhythm expert who writes at Blog Around the Clock, and who is the Online Community Manager for PLoS ONE, the various experiments in Science 2.0 are now proliferating so rapidly that it is almost impossible to keep track of them. “It’s a Darwinian process,” he says. “About 99 percent of these ideas are going to die. But some will emerge and spread.”
“I wouldn’t like to predict where all this is going to go,” Hooker adds. “But I’d be happy to bet that we’re going to like it when we get there.”
Comments
Subscribe to comments for this post OR Subscribe to comments for all Read/WriteWeb posts
So, Hakia should do something really spectacular to beat Google with the semantic approach. It should actually be able to understand complex sentences better than Google, and as such be a search engine for more complex tasks, for example for questions like ‘I need drivers for Geforce 8800, but not the latest version’. Currently, compared to Google, it doesn’t deliver.
Posted by: franticindustries | December 7, 2006 12:36 PM
I don’t think I’d feel comfortable asking “What were the causes of the American Civil War?” and have the search engine only spit back one result answer (or, one viewpoint).
Posted by: Josh | December 7, 2006 12:58 PM
In terms of single answer, I think if you are looking for a quick answer – possibly, but otherwise you would defnitely want more results.
The other thought occurs to me is that we might not necessarily need the new way of inputing the question in as much as we need new ways of getting the answer. So in a way, I view vertical search engines, like Retrevo, as approaching the same problem but from more pragmatic and better angle.
Alex
Posted by: Alex Iskold | December 7, 2006 1:02 PM
We are still developing, it will CONTINUE TO IMPROVE as many of the meaning associations will form in time, like connecting the neurons inside the human brain during childhood. hakia is like a TWO-year old child on the cognitive scale. But it grows EXPONENTIALLY — much faster than a human.
Cheers,
Melek
Posted by: melek pulatkonak | December 7, 2006 2:05 PM
Alex
Posted by: Alex Iskold | December 7, 2006 2:19 PM
Posted by: melek pulatkonak | December 7, 2006 2:25 PM
Posted by: Juha | December 7, 2006 3:58 PM
Posted by: Paul Walsh | December 7, 2006 4:31 PM
Alex
Posted by: Alex Iskold | December 7, 2006 5:15 PM
Posted by: Richard MacManus | December 7, 2006 6:34 PM
Hakia rocks, its a really good search experience!Cheers.
Posted by: Abhishek Sharma | December 8, 2006 2:33 AM
The query “What is the capital of Finland?”, could show Helsinki as an answer and provide related answers regarding history, population, etymology, other capitals etc.
For this capability Haika should not only be able to do semantic searches, but entity extraction as well, since RDF and XML schema’s are not that widespread at the moment.
If they can manage to do this, people won’t hesitate to abandon Google, especially because the Google brand is loosing it’s value rapidly because of SEO, spamming and privacy intrusions…
Posted by: Gert-Jan van Engelen | December 8, 2006 4:04 AM
Why did the US attack Iraq?
and
Why did Israel attack Lebanon?
It gace absolutely unrealted results which confirms that it is as good as as text search. However, when i tried the Q – “Who is Mahatama Gandhi?” – it immediately responded with a remark “See below the Mahatma Gandhi resume by hakia. What else do you want to know about Mahatma Gandhi?”
My hunch is that Hakia guys have set up a word filter before the search query gets executed on its DB (call it a ‘semantic filter’ if you’s like). If it contains words like ‘Who’ or ‘What’ it is set to return the ‘resumes’ and ‘galariies’ for the rest of the search terms. But that isnt what a semantic is about – the engine still does not ‘understand’ my question – thats just a slightly ‘domain restricted’ search being performed.
I could as well have a dropdown for domain (who, what etc) before the search box and retrict the search queries myself!
While Hakia is not bad – i wont give up my Google for it!
Posted by: Nikhil Kulkarni | December 8, 2006 8:25 AM
Posted by: michal frackowiak | December 8, 2006 2:05 PM
Posted by: Phillip Rhodes | December 8, 2006 2:07 PM
Alex
Posted by: Alex Iskold | December 8, 2006 2:26 PM
a. have demographic – based parsing logic, not just language – based.
b. know the demographics of the user submitting the query.Posted by: Ernesto | December 8, 2006 2:31 PM
Personalize( Semantic Search ) ==> Really cool stuff.
Alex
Posted by: Alex Iskold | December 8, 2006 2:36 PM
If google an upstart can do it to yahoo it can happen again.Posted by: Shinderpal jandu | December 8, 2006 2:49 PM
There are many ways to work search engines but I’m quite surprise we keep seeing the same thing over and over again. What we are missing are real innovations, not a second runner up of same clothes with a different name.Posted by: Sal | December 8, 2006 2:55 PM
what is 5 plus 5?enough said.
Posted by: Dave | December 8, 2006 3:01 PM
Posted by: Viksit | December 8, 2006 3:53 PM
Currently, it wouldn’t pass a TRUE Turing Test — just mimics the foreign language copied from book to carry on conversation argument proposed by (insert name here, I forget it at the moment…)
^Wow… I just referred to like 5 things I learned last quarter in my freshman computer science classes… that felt good. Hope my thoughts make sense. Keep up the work Hakia, I really would be impressed to see success here, I just think it would have to incorporate some AI which is not looking good (from my eyes, anyway).
Posted by: Augie | December 8, 2006 9:08 PM
The answer to the last question you offered: Is semantical search fundamentally better than text search? depends greatly upon what you think semantical means in a search and retrieval context.
In a word though, the answer is a resounding Yes.
I think, in its most common and general usage (among peoples) semantics refers to the interpretation of the significance of the relationships and interactions of subjects and objects in a situational context.
For example, the semantics of the state of affairs in modern day Iraq range over a state of civil war to extreme cases of outside insurgencies intended to deceive and delude. When the semantics are cloudy and unclear, judgments and decisions about what and how to name particular aspects of the state of affairs can also be murky. Thereby interdependent judgments or decisions become delayed or the subject of further debate. Ideally you want to present a situation such that a uniform perception emerges, with semantics (significance) that drives or guides interpretations such that those that are relevant and those with the same validity or authority prevail.
As the Bush administration has demonstrated, the process, the presentation, the semantics– can become political and highly charged. When questions of significance persist, that is, questions ranging over the signifier and signified in a given situation, uncertainty, lack of clarity and disarray blur and obscure any significance and generally erode confidence and delay action.
This is not the kind of semantics the Semantic Web and AI technologies proclaim. In their quest to share and exchange information, they want just enough semantics to normalize data labels between systems so that they are able to exchange information and be sure they are referring to the same items in the data exchange. They want to use named references, with authority of course. In fact, they strive to clear and unambiguous semantics –a foreign concept to the Bush administration.
But semantics has to do with the significance of interpretation. What is significant in our experience of the search and retrieval application. What is of significance in the results of the search engine? Relevance. The benefit of semantic search is greater relevance. For Hakia to be relevant, it has to offer more relevance than Google. A semantic search engine should also offer more– in my opinion.
A modern language semantic search engine should offer more than relevance. It should offer insight. Rather than fixing semantics to simple categories for easy exchange, a truly semantical search engine should aid and assist one while exploring topics. It should help to relate language to abstract ideas instead of just connecting the keywords, names and nouns.
Posted by: Ken Ewell | December 8, 2006 11:32 PM
Much easier than combining keywords with booleans to try to simplify natural language to “search engine” language!
PS — No offense to you, Mr Ewell — I really do respect that your technological insights and opinions are worth 10 times my own because of the knowledge gap; I guess I just got really sick of seeing more politically charged comments in non-related areas… I’m just sick of politics all-together right now, I think. Not trying to start a flame-war or anything!
Posted by: Auggie | December 9, 2006 1:36 AM
I am impressed
Posted by: priya | December 9, 2006 11:42 AM
Posted by: Alex Iskold | December 10, 2006 6:15 AM
what is the administrative capital of the United States of America
what is the administrative capital of the USA
what is the administrative capital of the US
Unfortunately, Hakia provided irrelevant answers to all four questions. Google got 4/4.
Given the apparently overwhelming power of Google’s indexing algorithm and the extent of their dataset, a semantic-based search facility such as Hakia may have to seek a qualitatively different area of search in which to make a contribution.
Posted by: Graham Higgins | December 10, 2006 7:33 AM
Hakia – doing good work, but it’s too early to say something concrete. In addition, I would not like to accept that Google doesn’t have sementic features in their search algorithm. I’m sure they are working on it or looking out for something good (startup kid).
Posted by: Dhruba Baishya | December 16, 2006 7:24 PM
why dont people tell their salaries?Posted by: Anonymous | January 3, 2007 2:14 AM
El barco attravesta una cerradua – un vuelo de cerraduras – La estacion de ressorte – jogar de puente
The last is particular annoying. My daughter plays for England and I when I try to search for “Bridge” I am overwhemed with sites on civil engineering.
I specifically tested these.
with Hakia
The locks on the Grand Union Canal
Spring flowers (primavera) Springs in Gloustershire (mamanthal)
Bridge tournaments
The results on the whole were satisfactory – much better than Google. Understand is a difficult word to define. My definition (bueno espagnol) is the difference between Primavera, Ressorte, Mamanthal. In other words can we use our “understanding” in an operational way. My view is that precise definition + a large enough database = Turing. To some extent Hakia appears to do this. It must be the future. The fly in the oitment is what Google is doing.
Posted by: Ian Parker | January 6, 2007 5:27 AM