Feeds:
Posts
Comments

Archive for January 25th, 2008

Valleywag team

Tip Your Editors:
tips@valleywag.com | AIM

Managing Editor:
Owen Thomas
Email | AIM

Associate Editor:
Nicholas Carlson
Email | AIM

Reporters:
Mary Jane Irwin
Email | AIM
Tim Faulkner
Email | AIM
Jordan Golson
Email | AIM

Special Correspondent:
Nick Douglas

Very Special Correspondent:
Paul Boutin | Email

Valleywag emeritus:
Nick Denton

Valleywag
Google

Five reasons no one will replace Google

Google Grid - Valleywag“I’ve received 33,000+ hits and counting to this post,” says the blogger who wrote “Wikipedia 3.0: The End of Google?” on Monday. His piece got blogged all over, promoted to the Digg front page, and fueled the starry-eyed bloggers searching for doom to herald for Google. (It was also just a troll.) Kudos to him, but he — and everyone who believed him — was wrong.The blogger’s main premise was as follows: The Semantic Web, a logic-based version of the Internet (and an old idea), could render Google obsolete with an artificial intelligence system that provides real answers instead of keyword-based responses.

Sure it could, if Google didn’t plan to innovate for the next decade. Google has five advantages that will keep all but the most determined innovators from beating it to artificial intelligence.

  • Google knows semantics. Its entire business drives it toward pulling meaning from context. Better semantics make better ad placement and more precise search results. That’s the reasoning behind contextual ads, topical search results, and the closely guarded and ever-changing search algorithm.
  • Google has the smartest people in the world. Or damn close to it. Google’s increasingly discriminating hiring process weeds out all but the top engineers — executives are fond of saying that Google only hires people smarter than half its employees. As one tech exec said, “Yahoo’s morning bus may have wifi, but it doesn’t have any PhD’s on it.”
  • Google has Marissa Mayer. All “Marissa is a robot” jokes aside, Senior VP Marissa Mayer, one of the most powerful Google executives after founders Larry Page and Sergey Brin, is a titan of artificial intelligence. For her Bachelor’s and Master’s at Stanford, she specialized in A.I., and she holds several patents in the field. Her knowledge will not be lost in her role as Google’s product gatekeeper — it’s Marissa who decides what products are ready for release.
  • Google is filthy rich. And don’t think clickfraud will bring them down — today, Google launches GBuy, a payment system that trumps pay-per-click advertising with pay-per-sale, meanwhile bringing in the dollars of would-be buyers who don’t trust vendors, but do trust Google. All this income gives Google a lot more room to play than its most ambitious competitors.
  • Google says it’s working on AI. The co-founders already said that they’re building a sharper artificial intelligence. Their new ambient sound translator can already identify a TV show from five seconds of computer-captured sound. Google plans to use the system for even more contextualized ads and content. Why this isn’t the biggest tech news of the year is a mystery.
  • Google is not distracted. The company’s major competitors are Microsoft and Yahoo. The former is plagued by unwieldy plans for an operating system, software suite, and struggling media network. The latter is approaching media company status with an expanding network of original and outsourced content. While both Microsoft and Yahoo are making valuable progress in other fields, neither is innovating in search anywhere near the rate of Google. That’s why over the past year, Google is the only engine with a growing market share in the U.S., and why Google could soon become China’s top engine as well. And Google will stay on top — by beating everyone else to the world’s first global A.I. system.

Refuted: Wikipedia 3.0: The end of Google? [Evolving Trends]

7:30 AM ON WED JUN 28 2006
BY NICK DOUGLAS
958 views

Read More:

A.I., GOOGLE, MICROSOFT, SEMANTIC WEB, TOP, WIKIPEDIA, YAHOO

[bloqueado]

Comments







var site=”s23valleywag”Site Meter

Read Full Post »

Semantic Web

Reference.com 

Semantic Web

Wikipedia, the free encyclopediaCite This Source

The Semantic Web is an evolving extension of the World Wide Web in which web content can be expressed not only in natural language, but also in a format that can be read and used by software agents, thus permitting them to find, share and integrate information more easily. It derives from W3C director Sir Tim Berners-Lee‘s vision of the Web as a universal medium for data, information, and knowledge exchange.

At its core, the semantic web comprises a philosophy, a set of design principles, collaborative working groups, and a variety of enabling technologies. Some elements of the semantic web are expressed as prospective future possibilities that have yet to be implemented or realized. Other elements of the semantic web are expressed in formal specifications. Some of these include Resource Description Framework (RDF), a variety of data interchange formats (e.g. RDF/XML, N3, Turtle, N-Triples), and notations such as RDF Schema (RDFS) and the Web Ontology Language (OWL), all of which are intended to provide a formal description of concepts, terms, and relationships within a given knowledge domain.

Purpose

Humans are capable of using the Web to carry out tasks such as finding the Finnish word for “car”, to reserve a library book, or to search for the cheapest DVD and buy it. However, a computer cannot accomplish the same tasks without human direction because web pages are designed to be read by people, not machines. The semantic web is a vision of information that is understandable by computers, so that they can perform more of the tedious work involved in finding, sharing and combining information on the web.

For example, a computer might be instructed to list the prices of flat screen HDTVs larger than with 1080p resolution at shops in the nearest town that are open until 8pm on Tuesday evenings. Today, this task requires search engines that are individually tailored to every website being searched. The semantic web provides a common standard (RDF) for websites to publish the relevant information in a more readily machine-processable and integratable form.

Tim Berners-Lee originally expressed the vision of the semantic web as follows:

I have a dream for the Web [in which computers] become capable of analyzing all the data on the Web – the content, links, and transactions between people and computers. A ‘Semantic Web’, which should make this possible, has yet to emerge, but when it does, the day-to-day mechanisms of trade, bureaucracy and our daily lives will be handled by machines talking to machines. The ‘intelligent agents’ people have touted for ages will finally materialize.|20px|20px|Tim Berners-Lee, 1999

Semantic publishing will benefit greatly from the semantic web. In particular, the semantic web is expected to revolutionize scientific publishing, such as real-time publishing and sharing of experimental data on the Internet. This simple but radical idea is now being explored by W3C HCLS group’s Scientific Publishing Task Force

Tim Berners-Lee has further stated:

People keep asking what Web 3.0 is. I think maybe when you’ve got an overlay of scalable vector graphics – everything rippling and folding and looking misty – on Web 2.0 and access to a semantic Web integrated across a huge space of data, you’ll have access to an unbelievable data resource.|20px|20px|Tim Berners-Lee| A ‘more revolutionary’ Web

Relationship to the Hypertext Web

Markup

Many files on a typical computer can be loosely divided into documents and data. Documents, like mail messages, reports and brochures, are read by humans. Data, like calendars, addressbooks, playlists and spreadsheets, are presented using an application program which lets them be viewed, searched and combined in many ways. Currently, the World Wide Web is based mainly on documents written in Hypertext Markup Language (HTML), a markup convention that is used for coding a body of text interspersed with multimedia objects such as images and interactive forms. Metadata tags, for example provide a method by which computers can read the content of web pages.

The semantic web takes the concept further; it involves publishing the data in a language, Resource Description Framework (RDF), specifically for data, so that it can be manipulated and combined just as can data files on a local computer.

HTML describes documents and the links between them. RDF, by contrast, describes arbitrary things such as people, meetings, or airplane parts.

For example, with HTML and a tool to render it (perhaps Web browser software, perhaps another user agent), one can create and present a page that lists items for sale. The HTML of this catalog page can make simple, document-level assertions such as “this document’s title is ‘Widget Superstore'”. But there is no capability within the HTML itself to assert unambiguously that, for example, item number X586172 is an Acme Gizmo with a retail price of €199, or that it is a consumer product. Rather, HTML can only say that the span of text “X586172” is something that should be positioned near “Acme Gizmo” and “€ 199”, etc. There is no way to say “this is a catalog” or even to establish that “Acme Gizmo” is a kind of title or that “€ 199” is a price. There is also no way to express that these pieces of information are bound together in describing a discrete item, distinct from other items perhaps listed on the page.

See also: Semantic HTML, Linked Data.

Descriptive and extensible

The semantic web addresses this shortcoming, using the descriptive technologies Resource Description Framework (RDF) and Web Ontology Language (OWL), and the data-centric, customizable Extensible Markup Language (XML). These technologies are combined in order to provide descriptions that supplement or replace the content of Web documents. Thus, content may manifest as descriptive data stored in Web-accessible databases, or as markup within documents (particularly, in Extensible HTML (XHTML) interspersed with XML, or, more often, purely in XML, with layout/rendering cues stored separately). The machine-readable descriptions enable content managers to add meaning to the content, i.e. to describe the structure of the knowledge we have about that content. In this way, a machine can process knowledge itself, instead of text, using processes similar to human deductive reasoning and inference, thereby obtaining more meaningful results and facilitating automated information gathering and research by computers.

Skeptical reactions

Practical feasibility

Some critics question the basic feasibility of a complete or even partial fulfillment of the semantic web. Some develop their critique from the perspective of human behavior and personal preferences, which ostensibly diminish the likelihood of its fulfillment (see e.g., metacrap). Other commentators object that there are limitations that stem from the current state of software engineering itself. (see e.g., Leaky abstraction).

Where semantic web technologies have found a greater degree of practical adoption, it has tended to be among core specialized communities and organizations for intra company projects. The practical constraints toward adoption have appeared less challenging where domain and scope is more limited than that of the general public and the world wide web.

An unrealized idea

The original 2001 Scientific American article (from Berners-Lee) described an expected evolution of the existing Web to a Semantic Web. Such an evolution has yet to occur. Indeed, a more recent article from Berners-Lee and colleagues stated that: “This simple idea, however, remains largely unrealized.” Nonetheless, the recognized authorities in the Semantic Web keep asserting the feasibility of the original idea, and sometimes they even claim that many of the components of the initial vision have been already deployed.

Censorship and privacy

Enthusiasm about the semantic web could be tempered by concerns regarding censorship and privacy. For instance, text-analyzing techniques can now be easily bypassed by using other words, metaphors for instance, or by using images in place of words. An advanced implementation of the semantic web would make it much easier for governments to control the viewing and creation of online information, as this information would be much easier for an automated content-blocking machine to understand. In addition, the issue has also been raised that, with the use of FOAF files and geo location meta-data, there would be very little anonymity associated with the authorship of articles on things such as a personal blog.

Doubling output formats

Another criticism of the semantic web is that it would be much more time-consuming to create and publish content because there would need to be two formats for one piece of data: one for human viewing and one for machines. With this being the case, it would be much less likely for companies to adopt these practices, as it would only slow down their progress. However, many web applications in development are addressing this issue by creating a machine-readable format upon the publishing of data or the request of a machine for such data. The development of microformats has been one reaction to this kind of criticism.

Specifications such as eRDF and RDFa allow arbitrary RDF data to be embedded in HTML pages. The GRDDL (Gleaning Resource Descriptions from Dialects of Language) mechanism allows existing material (including microformats) to be automatically interpreted as RDF, so publishers only need to use a single format, such as HTML.

Components

XML, XML Schema, RDF, OWL, SPARQL

The semantic web comprises the standards and tools of XML, XML Schema, RDF, RDF Schema and OWL. The OWL Web Ontology Language Overview describes the function and relationship of each of these components of the semantic web:

  • XML provides an elemental syntax for content structure within documents, yet associates no semantics with the meaning of the content contained within.
  • XML Schema is a language for providing and restricting the structure and content of elements contained within XML documents.
  • RDF is a simple language for expressing data models, which refer to objects (“resources“) and their relationships. An RDF-based model can be represented in XML syntax.
  • RDF Schema is a vocabulary for describing properties and classes of RDF-based resources, with semantics for generalized-hierarchies of such properties and classes.
  • OWL adds more vocabulary for describing properties and classes: among others, relations between classes (e.g. disjointness), cardinality (e.g. “exactly one”), equality, richer typing of properties, characteristics of properties (e.g. symmetry), and enumerated classes.
  • SPARQL is a protocol and query language for semantic web data sources.

Current ongoing standardizations include:

The intent is to enhance the usability and usefulness of the Web and its interconnected resources through:

  • servers which expose existing data systems using the RDF and SPARQL standards. Many converters to RDF exist from different applications. Relational databases are an important source. The semantic web server attaches to the existing system without affecting its operation.
  • documents “marked up” with semantic information (an extension of the HTML tags used in today’s Web pages to supply information for Web search engines using web crawlers). This could be machine-understandable information about the human-understandable content of the document (such as the creator, title, description, etc., of the document) or it could be purely metadata representing a set of facts (such as resources and services elsewhere in the site). (Note that anything that can be identified with a Uniform Resource Identifier (URI) can be described, so the semantic web can reason about animals, people, places, ideas, etc.) Semantic markup is often generated automatically, rather than manually.
  • common metadata vocabularies (ontologies) and maps between vocabularies that allow document creators to know how to mark up their documents so that agents can use the information in the supplied metadata (so that Author in the sense of ‘the Author of the page’ won’t be confused with Author in the sense of a book that is the subject of a book review).
  • automated agents to perform tasks for users of the semantic web using this data
  • web-based services (often with agents of their own) to supply information specifically to agents (for example, a Trust service that an agent could ask if some online store has a history of poor service or spamming).

Projects

Neurocommons

The Neurocommons is an open RDF database developed by Science Commons It was compiled from major life sciences databases with a focus on neuroscience. It is accessible via a web-based front end using the SPARQL query language at its original location trieu and at the DERI mirror location

FOAF

A popular application of the semantic web is Friend of a Friend (or FoaF), which describes relationships among people and other agents in terms of RDF.

SIOC

The SIOC Project – Semantically-Interlinked Online Communities provides a vocabulary of terms and relationships that model web data spaces. Examples of such data spaces include, among others: discussion forums, weblogs, blogrolls / feed subscriptions, mailing lists, shared bookmarks, image galleries.

SIMILE

Semantic Interoperability of Metadata and Information in unLike Environments Massachusetts Institute of Technology

SIMILE is a joint project, conducted by the MIT Libraries and MIT CSAIL, which seeks to enhance interoperability among digital assets, schemata/vocabularies/ontologies, meta data, and services.

Linking Open Data

The Linking Open Data project is a community lead effort to create openly accessible, and interlinked, RDF Data on the Web. The data in question takes the form of RDF Data Sets drawn from a broad collection of data sources. There is a focus on the Linked Data style of publishing RDF on the Web.

The project is one of several sponsored by the W3C‘s Semantic Web Education & Outreach Interest Group (SWEO)

Tools

Browsers

A semantic web Browser is a form of Web User Agent that expressly requests RDF data from Web Servers using the best practice known as “Content Negotiation”. These tools provide a user interface that enables data-link oriented navigation of RDF data by dereferencing the data links (URIs) in the RDF Data Sets returned by Web Servers.

Examples of semantic web browsers include:

Services

Notification Services

Semantic Web Ping Service

The Semantic Web Ping Service is a notification service for the semantic web that tracks the creation and modification of RDF based data sources on the Web. It provides Web Services for loosely coupled monitoring of RDF data. In addition, it provides a breakdown of RDF data sources tracked by vocabulary that includes: SIOC, FOAF, DOAP, RDFS, and OWL.

Piggy Bank

Another freely downloadable tool is the plug-in to Firefox, Piggy Bank Piggy Bank works by extracting or translating web scripts into RDF information and storing this information on the user’s computer. This information can then be retrieved independently of the original context and used in other contexts, for example by using Google Maps to display information. Piggy Bank works with a new service, Semantic Bank, which combines the idea of tagging information with the new web languages. Piggy Bank was developed by the Simile Project, which also provides RDFizers, tools that can be used to translate specific types of information, for example weather reports for US zip codes, into RDF. Efforts like these could ease a potentially troublesome transition between the web of today and its semantic successor.

See also

Concepts and methodologies

Related articles

Companies and applications

References

Notes

External links

Wikipedia, the free encyclopedia © 2001-2006 Wikipedia contributors (Disclaimer)
This article is licensed under the GNU Free Documentation License.
Last updated on Tuesday January 01, 2008 at 03:04:36 PST (GMT -0800)
View this article at Wikipedia.orgEdit this article at Wikipedia.orgDonate to the Wikimedia Foundation

if (LEXICO_Globals.GoogleAFC.ads.content.length) { document.write(““); document.write(LEXICO_Globals.GoogleAFC.ads.contentTop); ;document.write(LEXICO_Globals.GoogleAFC.ads.sponsoredLinks); document.write(LEXICO_Globals.GoogleAFC.ads.content[2]); document.write(“”); document.write(LEXICO_Globals.GoogleAFC.ads.contentTop); ;document.write(LEXICO_Globals.GoogleAFC.ads.content[3]); document.write(“”); document.write(LEXICO_Globals.GoogleAFC.ads.contentTop); ;document.write(LEXICO_Globals.GoogleAFC.ads.content[4]); document.write(“”); document.write(““); }

Read Full Post »

 http://ontoworld.org/wiki/Main_Page

Main Page

Semantics to the people!

About this wiki

Ontoworld.org runs on Semantic MediaWiki and thus is a true semantic wiki not just with respect to its content. Semantic features are used in many places, such as on this very page: e.g. the lists of events and portals above are computed automatically from the contents of the wiki. In other places, semantic data serves as a basis to enable reuse in external tools. For example, the wiki employs the FOAF vocabulary in descriptions of people, and via RDF export this information can be evaluated in external tools. For more information, go to the Semantic MediaWiki portal page.

Welcome!This is ontoworld.org, the wiki for the Semantic Web community. Our mission is to provide a knowledge repository and platform for advertising events, spreading news, and announcing new developments. It is a wiki: everybody can quickly edit its content, even without logging in. So look around and participate!
If you are new to this wiki you may want to start browsing the contents on the right. Editing pages works as on Wikipedia, but we also have a starters guide within this wiki. Be sure to check out the page about yourself (yes, it might even be there already!).
News

  • July 5 2007. Semantic MediaWiki receives the third prize of the annual do it.software-awards, granted to software products that successfully carry scientific developments into practice. The SMW-team thanks all contributors and supporters!
  • June 12 2007. Simile’s Exhibit toolkit now provides Semantic MediaWiki source code as an output format. It thus can also be used for converting e.g. RDF or JSON into SMW. Try it at the presidents demo (click “Copy All”).
  • April 29 2007. Ontoworld has been attacked by spam bots forcing us to install a simple captcha extension. When entering a new URL on a page, you now have to prove your human intelligence in a simple way. Registered users with a confirmed email address are not affected for now.
  • April 28 2007. Semantic MediaWiki 0.7 has been released and installed on this site. Get it at SourceForge.
  • February 15 2007. Ontoworld now uses the (almost) latest developers versions of MediaWiki and Semantic MediaWiki, so that all upcoming features can be tested.
  • November 06 2006. Further ISWC meta-data has been added. The readable ISWC timetable now refers to the wiki page of each paper.
  • November 03 2006. Parts of the ISWC2006 metadata have been imported. Especially, every accepted paper now has a wiki article that can also be edited for further comments and references.
  • news archive …

People

The wiki should now contain pages for many community members, either written by themselves or by others. The semantic features of this wiki also create a FOAF file with each person’s page.

To go to your page, just type your name into the below field and search.

For an overview of the people in this wiki, go to the people portal.

Events

You can find information about many events and calls for papers within this wiki. Using semantic annotation, it is possible to query for particular events.

Upcoming events: OnAV08 (Barcelona, 4 March 2008), OWLED 2008 (Gaithersburg MA, 1 April 2008), SWKM2008 (Beijing, 22 April 2008), WWW2008 (Beijing, 22 April 2008), SeMMA2008 (Teneriffe, 1 June 2008)  full list

Upcoming submission deadlines: WSSG’2008 (31 January 2008), OWLED 2008 (15 February 2008), SemWiki2008 (22 February 2008), SemBPM (1 March 2008), SeMMA2008 (7 March 2008), SIWN 2008 (2 April 2008), KS 2008 (14 April 2008) full list

Organising an event? Advertise it here by quickly creating an article! Just enter the event’s abbreviation in the field below to get an edit box with further documentation:

Topics

This site is also a place to publish and discuss actual research. One way of doing so is via a community portal for your specific topic. At the moment, this wiki contains community portals for

Why not add your own?

If you build software, you should definitely make a page about your tool as well, and put up links in appropriate places.

About this wiki

Ontoworld.org runs on Semantic MediaWiki and thus is a true semantic wiki not just with respect to its content. Semantic features are used in many places, such as on this very page: e.g. the lists of events and portals above are computed automatically from the contents of the wiki. In other places, semantic data serves as a basis to enable reuse in external tools. For example, the wiki employs the FOAF vocabulary in descriptions of people, and via RDF export this information can be evaluated in external tools. For more information, go to the Semantic MediaWiki portal page.

Do you think that items on this page are out of date? You can clean its cache to manually update all dynamic parts to the latest data from the wiki.

Powered by MediaWiki

Read Full Post »

Evolving Trends

January 13, 2007

Self-Aware Text

(this post was updated at 12:10am, Jan 15, 2007)

Enabling self-organizing text

—Summary of an interesting model for realizing self-organizing text (excerpted from this cognitively deranged, subvertly racially biased but otherwise technically sound source)—

Spin glasses are materials with chaotically oriented atomic spins which can reach neither a ferromagnetic equilibrium (spins aligned) nor a paramagnetic one (spins canceling in pairs), because of long-range spin interactions between magnetic trace atoms (Fe) and the conduction electrons of the host material (Cu). Because these effects reverse repeatedly with distance, no simple state fully resolves the dynamics, and spin glasses thus adopt a large variety of [globally] disordered states [with short range order.] Modelling the transition to a spin glass [i.e. simulated annealing] has close parallels in neural nets, particularly the Hopfield nets consisting of symmetrically unstable circuits. Optimization of a task is then modelled in terms of constrained minimization of a potential energy function. However the problem of determining the global minimum among all the local minima in a system with a large number of degrees of freedom is intrinsically difficult. Spin glasses are also chaotic and display sensitive dependence. Similar dynamics occurs in simulations of continuous fields of neurons.

Annealing is a thermodynamic simulation of a spin glass in which the temperature of random fluctuations is slowly lowered, allowing individual dynamic trajectories to have a good probability of finding quasi-optimal states. Suppose we start out at an arbitrary initial state of a system and follow the topography into the nearest valley, reaching a local minimum. If a random fluctuation now provides sufficient energy to carry the state past an adjacent saddle, the trajectory can explore further potential minima. Modelling such processes requires the inclusion of a controlled level of randomness in local dynamical states, something which in classical computing would be regarded as a leaky, entropic process. The open environment is notorious as a source of such [controlled level of randomness], which may have encouraged the use of chaotic systems in the evolutionary development of the vertebrate brain.

—End of Summary—

Imagine the interaction between random words in the English language having two properties: aligned and none-aligned. If you throw the whole set of words into a heated spin-glass alloy (e.g. Cu-Fe), where the words replace the atoms and where word-word interactions replace spin-spin interactions, and then let it cool slowly (i.e. anneal it) then the system (of word-word interactions) should theoretically self-organize into the lowest potential energy state it could find.

The spin glass model (from the above quoted summary) implements an optimization process that is also a self organizational process that finds the local energy minima associated with a meta-stable state for the system which in turn organizes the local interactions between atomic spins (or words) to minimize discordant interactions (or disorder) in the short range, thus (in the case of word-word interactions) generating text that goes from garbarge in the long range (as a result of globally disordered interactions in the long range) to well-formed in the short range (as a result of mostly aligned/ordered interactions in the short range.)

This idea is pretty raw, incomplete, and may not be the most proper use (or abuse) of the spin glass model (see References.)

However, in line with evolution’s preference for such a model for the brain, I find it useful to inject a controlled level of noise (randomness) into the thinking process.

Well, after having some apple crumble, I realize now (randomness works) that the reason this model will work well is because it will generate many well-formed sentences in each state of the system so there is bound to be a percentage of sentences that will actually make sense!

Having said that, this adpatation of the [SK] spin-glass model is pretty rough and needs more thinking to nail down, but the basic idea is good!

From Self-Organizing to Self Aware

What if instead of simply setting the rules and letting order emerge out of chaos (at least in the short range), as implied above, what if each word was an intelligent entity? What if each word knew how to fit itself with other words and within a sentence such that the words work collaboratively and competitively with each other to generate well-formed sentences and even whole articles?

The words would have to learn to read. 🙂

[insert your Web X.0 fantasy]

Reference

  1. Spin Glass Theory and Beyond

Images

Short range ordered regions in 2D state space of a spin glass.

Posted by Marc Fawzi

Share and Prosper digg.png

Tags:

web 3.0, web 3.0, web 3.0, semantic web, semantic web, artificial intelligence, AI, statistical mechanics, stochastic, optimization, simulated-annealing, self-organization, spin glass

6 Comments »

  1. […] A good example of such structural/sensory duality is the article ”Self Aware Text“     […]

    Pingback by Thinking About Music is Like Dancing About Architcture « Evolving Trends — January 14, 2007 @ 6:27 am

  2. […] Self Aware Text […]

    Pingback by Patterns of Survival « Evolving Trends — January 14, 2007 @ 7:06 am

  3. what?

    just kidding, i get it – not sure i could ever have the guts to implement any of it.

    Comment by Phill — January 15, 2007 @ 6:33 am

  4. I’ve seen the phrase enough times but I never knew what a spin glass actually was before. Thanks!

    Comment by scalefree — January 17, 2007 @ 12:03 pm

  5. The spin glass model, as a high level concept, has captivated me since I was 16. I got into it after reading an exciting article about a sping-glass based neural network using simulated annealing.

    Comment by evolvingtrends — January 18, 2007 @ 5:27 am

  6. […] Self-Aware Text (took two nights to get right) […]

    Pingback by Sensoria 3.0: The End of Your Mind? « Evolving Trends — September 24, 2007 @ 11:34 pm

RSS feed for comments on this post. TrackBack URI

Read Full Post »

Evolving Trends

January 7, 2007

Designing a better Web 3.0 search engine

This post discusses the significant drawbacks of current quasi-semantic search engines (e.g. hakia.com, ask.com et al) and examines the potential future intersection of Wikipedia, Wikia Search (the recently announced search-engine-in-development, by Wikipedia’s founder), future semantic version of Wikipedia (aka Wikipedia 3.0), and Google’s Pagerank algorithm to shed some light on how to design a better semantic search engine (aka Web 3.0 search engine)

Query Side Improvements

Semantic “understanding” of search queries (or questions) determines the quality of relevant search results (or answers.)

However, current quasi-semantic search engines like hakia and ask.com can barely understand the user’s queries and that is because they’ve chosen free-form natural language as the query format. Reasoning about natural language search queries can be accomplished by: a) Artificial General Intelligence or b) statistical semantic models (which introduce an amount of inaccuracy in constructing internal semantic queries). But a better approach at this early stage may be to guide the user through selecting a domain of knowledge and staying consistent within the semantics of that domain.

The proposed approach implies an interactive search process rather than a one-shot search query. Once the search engine confirms the user’s “search direction,” it can formulate an ontology (on the fly) that specifies a range of concepts that the user could supply in formulating the semantic search query. There would be a minimal amount of input needed to arrive at the desired result (or answer), determined by the user when they declare “I’ve found it!.”

Information Side Improvements

We are beginning to see search engines that claim they can semantic-ize arbitrary unstructured “Wild Wild Web” information. Wikipedia pages, constrained to the Wikipedia knowledge management format, may be easier to semantic-ize on the fly. However, at this early stage, a better approach may be to use human-directed crawling that associates the information sources with clearly defined domains/ontologies. An explicit publicized preference for those information sources (including a future semantic version of Wikipedia, a la Wikipedia 3.0) that have embedded semantic annotations (using, e.g., RDFa http://www.w3.org/TR/xhtml-rdfa-primer/ or microformats http://microformats.org) will lead to improved semantic search.

How can we adapt the currently successful Google PageRank algorithm (for ranking information sources) to semantic search?

One answer is that we would need to design a ‘ResourceRank’ algorithm (referring to RDF resources) to manage the semantic search engines’ “attention bandwidth.” Less radical, may be to design a ‘FragmentRank’ algorithm which would rank at the page-component level (ex: paragraph, image, wikipedia page section, etc).

Related

  1. Wikipedia 3.0: The End of Google?
  2. Search By meaning

Update

  1. See relevant links under comments

Posted by Marc Fawzi and ToxicWave

Share and Prosper digg.png

Tags:

web 3.0, web 3.0, web 3.0, semantic web, semantic web, ontology, reasoning, artificial intelligence, AI, hakia, ask.com, pagerank, google, semantic search, RDFa, ResourceRank, RDF, Semantic Mediawiki, Microformats

15 Comments »

  1. I found the following links at http://wiki.ontoworld.org/index.php/SemWiki2006

    1) http://wiki.ontoworld.org/wiki/Harvesting_Wiki_Consensus_-_Using_Wikipedia_Entries_as_Ontology_Elements
    “The English version of Wikipedia contains now more than 850,000 entries and thus the same amount of URIs plus a human-readable description. While this collection is on the lower end of ontology expressiveness, it is likely the largest living ontology that is available today. In this paper, we (1) show that standard Wiki technology can be easily used as an ontology development environment for named classes, reducing entry barriers for the participation of users in the creation and maintenance of lightweight ontologies, (2) prove that the URIs of Wikipedia entries are surprisingly reliable identifiers for ontology concepts, and (3) demonstrate the applicability of our approach in a use case.”

    2) http://wiki.ontoworld.org/wiki/Extracting_Semantic_Relationships_between_Wikipedia_Categories
    “We suggest that semantic information can be extracted from Wikipedia by analyzing the links between categories. The results can be used for building a semantic schema for Wikipedia which could improve its search capabilities and provide contributors with meaningful suggestions for editing theWikipedia pages.We analyze relevant measures for inferring the semantic relationships between page categories of Wikipedia.”

    3) http://wiki.ontoworld.org/wiki/From_Wikipedia_to_Semantic_Relationships:_a_Semi-automated_Annotation_Approach

    Comment by SeH.999 — January 7, 2007 @ 8:45 pm

  2. Thanks for the relevant links.

    Marc

    Comment by evolvingtrends — January 7, 2007 @ 9:02 pm

  3. What if you had an AI which used stochastic models and had feedback mechanisms so that it could use evolutionary programming to learn which results were best? Combining Yahoo and Google (people and robots)…?

    Comment by Sam Jackson — January 8, 2007 @ 2:18 pm

  4. > What if you had an AI which used stochastic models…

    in a way, the data set (wikipedia pages + wild-wild-web pages) is itself stochastic.

    re feedback mechanism: if google knows what search results you visit, then they can feedback visited pages into pagerank. but in a directed, multi-step search process, the way the user narrows results is explicit, yielding a _much richer_ feedback loop. not just in terms of which results are chosen, but in the _particular way_ sets of results answer the search ‘problem’.

    re evolutionary programming: useful (along with neural networks) as a possible method that the search-engine uses to optimize its operating parameters, in the crawl or result-fetching stages.

    merging/unfiying the crawl and results processes together, you can imagine a human supervised-learning process where the engine learns how to crawl _and_ fetch/present results for randomly-generated, historical, or real-time queries. this way, everyone that uses the engine unknowingly trains it.

    “Using the knowledge linked to by URL u, I can answer search ‘directions’ according to Ontology o”

    Comment by SeH.999 — January 8, 2007 @ 8:30 pm

  5. My line of thought precisely. Although I wonder if that would open it up to a whole new realm of blackhat SEO with click farms in china or on zombie armies? Something for Google et al to try to work out, I guess.

    Comment by Sam Jackson — January 8, 2007 @ 9:23 pm

  6. Google has no future.

    Money does not buy the future. It only glues you to the present, and the present becomes the past.

    The future is not for sale. It’s for those who can claim it.

    Money obeys the future, not vice versa.

    Marc

    Comment by evolvingtrends — January 9, 2007 @ 4:02 am

  7. Well, there’s a saying that goes: money talks, bullshit walks.

    However, the problem with Google is bigger than money can fix.

    Google is stuck with a technology and a business model that are less optimal than what is possible today (never mind what will be possible in two or three years), so they either distribute all their profits as dividends and start over with Google 3.0 using a new technology and a new business model (i.e. disrupt themselves) or submit to the fact that their technology and business model are, like all technologies and business models, not immune to disruption.

    But that’s just one view. Another view could be that they will last forever or for a very long time. They may very well last forever or a very long time but definitely not as the dominant search engine. Anyone who thinks so is contradicting nature and idolizing Google.

    Nature is all about survival of the fittest.

    Google’s technology and business model are not the fittest, by design.

    Who will undermine Google?

    That’s the $300B question.

    My answer is: Google itself.

    It’s like being on a seasaw, over a cliff. For now, the mountain side is weighed down by mass misconception and by the competitors’ sub-mediocre execution.

    Speaking of execution, let me inject the word “Saddam” here so Google starts associating this blog with Saddam execution videos. Do you see how dumb Google is???

    It’s not about semantic vs non-semantic design. It’s about bad design vs good design. You can undermine a bad desin a lot easier than a good design.

    It’s time to come up with a good one!

    There are private companies competing with NASA (the organization that put a man on the moon 38 years ago) and they’re succeeding at it … Why shouldn’t we have an X Prize for teh first company to come up with a P2P search engine that beats google (i.e. The People’s Google)?

    Time for breakfast, again.

    Marc
    P.S. I do have to believe in breakfast in order to exist.

    Comment by evolvingtrends — January 9, 2007 @ 11:57 am

  8. I agree with your vision. But there are many technical difficulties. For example, on-the-fly ontology generation is a very hard problem. Especially if you want to play it on the user side, I doubt wether it might work. We will have new search models (other than Google and Yahoo) for Semantic Web. But the time is not ready for the revolution yet.

    Anyway, I believe your thoughts are great. Recently I will post a new article about web evolution. I think you might be interested in reading it. 😉

    Comment by Yihong Ding — January 9, 2007 @ 1:26 pm

  9. No one can say the “time is not ready,” especially not a semantic web researcher. The time is always ready. The question is whether or not we’re ready. I believe we are 🙂 …

    Things already in motion.

    Comment by evolvingtrends — January 10, 2007 @ 5:59 am

  10. > But there are many technical difficulties. For example, on-the-fly ontology generation is a very hard problem.

    Any elementary algorithm can generate on-the-fly ontologies, the question is how useful, reusable, and accurate they are.

    If you think along the lines of “Fluid ontologies”, “Fluid Knowledge,” or “Evolving Ontologies”? May be a killer app for semantic web, because the ‘rigid’ binding OWL (or OWL-like) ontologies to data yields a relatively narrow range of expression.

    > But the time is not ready for the revolution yet.

    The time has always been “ready for the revolution yet”, but it has never been ready for people to state that it hasn’t. 😉

    Comment by SeH.999 — January 11, 2007 @ 4:38 pm

  11. http://blog.wired.com/monkeybites/2007/01/wikiseek_launch.html
    Tuesday, 16 January 2007
    SearchMe Launches Wikiseek, A Wikipedia Search Engine
    Topic: search

    The search engine company SearchMe has launched a new service, Wikiseek, which indexes and searches the contents of Wikipedia and those sites which are referenced within Wikipedia. Though not officially a part of Wikipedia, TechCrunch reports that Wikiseek was “built with Wikipedia’s assistance and permission”

    Because Wikiseek only indexes Wikipedia and sites that Wikipedia links to, the results are less subject to the spam and SEO schemes that can clutter up Google and Yahoo search listings.

    According to the Wikiseek pages, the search engine “utilizes Searchme’s category refinement technology, providing suggested search refinements based on user tagging and categorization within Wikipedia, making results more relevant than conventional search engines.”

    Along with search results Wikiseek displays a tag cloud which allows you to narrow or broaden your search results based on topically related information.

    Wikiseek offers a Firefox search plugin as well as a Javascript-based extension that alters actual Wikipedia pages to add a Wikiseek search button (see screenshot below). Hopefully similar options will be available for other browsers in the future.

    SearchMe is using Wikiseek as a showcase product and is donating a large portion of the advertising revenue generated by Wikiseek back to Wikipedia. The company also claims to have more niche search engines in the works.

    If Wikiseek is any indication, SearchMe will be one to watch. The interface has the simplicity of Google, but searches are considerably faster — lightning fast, in fact. Granted, Wikiseek is indexing far fewer pages than Google or Yahoo. But if speed is a factor, niche search engines like Wikiseek may pose a serious threat to the giants like Google and Yahoo.

    Steve Rubel of Micro Persuasion has an interesting post about the growing influence of Wikipedia and how it could pose a big threat to Google in the near future. Here are some statistics from his post:

    The number of Wikipedians who have edited ten or more articles continues its hockey stick growth. In October 2006 that number climbed to 158,000 people. Further, media citations rose 300% last year, according to data compiled using Factiva. Last year Wikipedia was cited 11,000 times in the press. Traffic is on the rise too. Hitwise says that Wikipedia is the 20th most visited domain in the US.

    While Wikiseek will probably not pose a serious threat to the search giants, Wikipedia founder Jimmy Wales is looking to compete with the search giants at some point. While few details have emerged, he has announced an as-yet-unavailable new search engine, dubbed Search Wikia, which aims to be a people-powered alternative to Google.

    With numbers like the ones cited above, Wikipedia may indeed pose a threat to Google, Yahoo and the rest.

    Comment by Tina — January 16, 2007 @ 7:39 pm

  12. Copying the Wikipedia 3.0 vision in a half assed way is more about leveraging the hype to make a buck than moving us forward.

    However, I’d give any effort a huge benefit of the doubt just for trying.

    🙂

    Comment by evolvingtrends — January 17, 2007 @ 2:17 am

  13. […] Jan 7, ‘07: Also make sure to check out “Designing a Better Web 3.0 Search Engine.” […]

    Pingback by Wikipedia 3.0: The End of Google? « Evolving Trends — March 2, 2007 @ 10:31 pm

  14. […] turned up a short counter-point blog post about their approach by Marc Fawzi and […]

    Pingback by Blank (Media) Slate » Blog Archive » Promise of a Better Search with Hakia — March 9, 2007 @ 5:33 pm

  15. […] Now see this Evolving Trends article that preceded the description from the above. Designing a Better Web 3.0 Search Engine. […]

    Pingback by Hakia, Google, Wikia (Revision 2) « Evolving Trends — September 26, 2007 @ 10:08 pm

Read Full Post »

Google Answers: The End of Google Co-Op? Nov 30, 2006

Evolving Trends

Sorry, no posts matched your criteria!  

I have looked for it and there is a copy in:

twopointouch

web 2.0, blogs and social media

twopointouch header image 2

Seeking Answers

November 30th, 2006 by Ian Delaney

Google Answers has been closed while Yahoo! Answers goes from strength to strength. The key difference between the two is that Google’s service paid vetted ‘experts’ to produce results, while Yahoo allows anyone to pitch in. The whole thing leaves a lot of questions.I’m not sure whether the stats prove an uncomplicated victory for social search and crowdsourced problem-solving, for a start. I’ve really no idea which service produces better answers, being one issue. It probably depends on the question. ‘What’s a good Italian restaurant in Cardiff?’ will work well with the Yahoo! model because it has a wider reach. On the other hand, you might not want to trust folk wisdom for a solution to matters that require a specialised knowledge.

It does show that a free-for-all, give-and-take knowledge source is very addictive and, presumably, helpful enough. Involving people like Stephen Hawking and Oprah Winfrey bought Yahoo! a vital share of attention Google never bothered with. Also, as Brady Forrest points out, Yahoo!’s model could scale organically, while Google’s required the recruitment and vetting of answerers, a time-consuming and distracting business.

Is this victory analagous to what will happen in the battle between the Wikipedia and the Britannica? It seems very similar on face value. Not entirely, though, since their business models are different: Wikipedia survives on charitable donations and drubbing the opposition when it comes to traffic is not nearly as helpful as it has been to Yahoo!

[I interviewed Steven Taylor, RVP of Yahoo! UK here, back in August and he talked a little about the Answers service]

answers

Tags:   · · · · 4 Comments

4 responses so far ↓
  • 1 Marc Fawzi Nov 30, 2006 at 4:35 pm
    Of course, I went ahead and published the Web 3.0 angle!http://evolvingtrends.wordpress.com/2006/11/30/googleanswers-the-end-of-googleco-op/

    )

  • 2 Ian Delaney Dec 1, 2006 at 1:36 am
    LOL. Of course. I wouldn’t expect anything less. – Hey you aren’t ’stuck at the hilton’ or ‘travelling mexico’ anymore. Does that mean you’re finally home?
  • 3 wuyasea operator Dec 1, 2006 at 8:19 pm
    sorry to hear google answers closing down. I personally liked it very much. enough for the bad news. now the good news:http://www.wuyaSea.com welcomes all google answer refugees.

    WuyaSea.com allow you to ask question for free or for a fee (minimum $2). any category, any topic.
    WuyaSea.com also allow you to answer any questions, our commission now is 20% + $0.20.

    http://www.wuyaSea.com
    wuyaSea Operator

  • 4 Ian Delaney Dec 1, 2006 at 8:26 pm
    Interesting. How much do you pay the answer providers and how do you vet them?

Read Full Post »

Evolving Trends

November 25, 2006

Plagiarism By Meaning (The New York Times and Web 3.0)

Here is an entry from this blog that chronicles the coining of the term Web 3.0 on this blog in relation to the Semantic Web and AI agents and, which predated by over five (5) months the use of the term by the Ney York Times in this same context: http://evolvingtrends.wordpress.com/web-30/

Here is the Evolving Trends article that was the first article to coin, in a highly publicized way, the term Web 3.0 in the context of the Semantic Web and AI agents:

http://evolvingtrends.wordpress.com/2006/06/26/wikipedia-30-the-end-of-google/

And here is the Johnny-come-lately Web 3.0 article by the New York Times that does the same thing but in a different way …

http://www.nytimes.com/2006/11/12/business/12web.html

This boils down to plagiarism by meaning (as opposed to plagiarism of words, sentences, etc.)

Sort of like how the three Abrahamic religions plagiarized the meaning of each other to create competing religions from the same ideas and coinages.

Welcome to the new agent (or rather ‘age’) of decentralized religion.

Update

The NYT article made it into the Wikipedia entry on Web 3.0 but the Wikipedia Cult/Cabal led by Jimmy Wales rejected mention of the Evolving Trends article on the basis that it is a blog entry. That is despite the fact that millions of people read it and that it defined the term Web 3.0 well before the NYT article did, in the same context. This is all while Jimmy Wales plotted his own venture that attempts to leverage the good will of the people (not to be confused with that of the corrupt Wikipedia administrators) to build a user-enhanced search engine, not that different ultimately than what was proposed in the Evolving Trends article. The two events are NOT directly connected. It’s just that being the dictator of all human knowledge (which he has clearly demonstrated by recruiting, promoting and protecting corrupt administrators that censor knowledge on arbitrary or even malicious basis) does not bode well for his effort to leverage people’s good will for his own commercial gain.

Narcissists are everywhere, but especially at the top.

It’s time to flatten the pyramid and adopt P2P and mesh technologies.

Who needs centralized media when we have blogs and who needs centralized (and censored) Wikipedia when we can have a distributed one with user-rated entries (like Google’s Knol but distributed not centralized.)

But remember that there is no wisdom to the crowd.

Only individuals are capable of wise decisions.

So remember to lead and not follow.

If you have to follow, follow your intuition.

Related

  1. Wikipedia 3.0: The End of Google?

Posted by Marc Fawzi

Enjoyed this analysis? You may share it with others on:

    digg.png newsvine.png nowpublic.jpg reddit.png blinkbits.png co.mments.gif stumbleupon.png webride.gif del.icio.us

Tags:

Web 3.0, Web 3.0, Semantic Web, New York Times, DecentraliZed Religion, DecentraliSed Religion, R3LIGION

2 Comments »

  1. […] Plagiarism By Meaning (The New York Times and Web 3.0) […]

    Pingback by Wikipedia 3.0: The End of Google? « Evolving Trends — November 27, 2006 @ 4:51 am

  2. […] And we inadvertently got Jimmy Wales’ into it: here (also see this) […]

    Pingback by Hakia, Google, Wikia « Evolving Trends — January 18, 2008 @ 6:08 pm

RSS feed for comments on this post. TrackBack URI

Read Full Post »

Older Posts »

%d bloggers like this: