Posts Tagged ‘Web’

Theorem: “The limit of The Artificial Intelligence”.



The limit of the Artificial Intelligence are not set by the use of machines themselves, and biological systems could be used to reach this goal, but as the Logic that is being used to construct it does not contemplate the concept of time, since it is purely formal logic and metonymic lacks the metaphor, and this is what Gödel’s theorems remark, the final tautology of each construction or metonymic mathematical language, which leads to inconsistencies. The construction of the Artificial Intelligence is an Undecidible Problem .


This consistent logic is completely opposite to the logic that makes inconsistent use of time, inherent of human unconscious, but the use of time is built on the lack, not on positive things, it is based on denials and absences, and this is impossible to reflect on a machine because of the perceived lack of the required self-awareness is acquired with the absence.


The problem of Artificial Intelligence is that we are trying to build an Intelligence system to replace our way of thinking, at least in the information search, but the special nature of human mind is the use of metaphor which lets human beings reach a conclusion, therefore does not exist in the human mind the Halting Problem or stop of calculation.


If you suppose as a theorem, that it is possible to construct a machine, with a Intelligence with capabilities similar to human Intelligence, should we face it as a theorem, we can prove it to be false with a Counter Example, and it is given in the particular case of the Turing machine and “the halting problem” or stop of calculation.


So all efforts faced toward Artificial Intelligence are doomed to failure a priori if the aim is to extend our human way of thinking into machines, they lack the metaphorical speech, because only a mathematical construction, which will always be tautological and metonymic, and lacks the use of metaphor that is what leads to the conclusion or “stop”.

Read Full Post »

From Logic to Ontology: The limit of “The Semantic Web”
Francisco Antonio Cerón García
Physics’s Spanish Royal Society

The limits of the semantic web (http://en.wikipedia.org/wiki/Semantic_Web) are not set by the use of machines themselves and biological systems could be used to reach this goal, but as the logic (http://en.wikipedia.org/wiki/Logic) that is being used to construct it does not contemplate the concept of time, since it is purely formal logic and metonymic lacks the metaphor, and that is what Gödel’s theorems (http://en.wikipedia.org/wiki/Gödel’s_incompleteness_theorems) remark, the final tautology of each construction or metonymic language mathematical (http://en.wikipedia.org/wiki/Mathematical_logic), which leads to inconsistencies. The construction of the Semantic Web is a undecidible problem (http://en.wikipedia.org/wiki/Undecidable_problem).
This consistent logic is completely opposite to the logic that makes inconsistent use of time (http://en.wikipedia.org/wiki/Jacques_Lacan), inherent of human unconscious, but the use of time is built on the lack, not on positive things, it is based on denials and absences, and that is impossible to reflect on a machine because of the perceived lack of the required self-awareness is acquired with the absence.
The problem is we are trying to build an intelligent system to replace our way of thinking, at least in the information search, but the special nature of human mind is the use of time which lets human beings reach a conclusion, therefore does not exist in the human mind the halting problem (http://en.wikipedia.org/wiki/Halting_problem) or stop of calculation.
So all efforts faced toward semantic web are doomed to failure a priori if the aim is to extend our human way of thinking into machines, they lack the metaphorical speech, because only a mathematical construction, which will always be tautological and metonymic, and lacks the use of the time that is what leads to the conclusion or “stop”.
As a demonstration of that, if you suppose it is possible to construct the semantic web, as a language with capabilities similar to human language, which has the use of time, should we face it as a theorem, we can prove it to be false with a counter example, and it is given in the particular case of the Turing machine (http://en.wikipedia.org/wiki/Turing_machine) and “the halting problem”.
One solution for the problem to build the semantic web could be the use of Non-formal or Inconsistency Logic (https://methainternet.wordpress.com/2008/01/27/non-formal-or-inconsistency-logic-lacans-logic-godels-incompleteness-theorems).

Read Full Post »

See the Complete Report with images at:






Semantic Wave 2008 Report: Industry Roadmap to Web 3.0 & Multibillion Dollar Market Opportunities


Mills Davis, Managing Director, Project 10X http://www.project10x.com




Semantic Wave 2008 Report:

Industry Roadmap to Web 3.0 and Multibillion Dollar Market Opportunities

Dear reader, Project10X is pleased to announce publication of a comprehensive, ground-breaking 720-page study of semantic technologies and their market impact entitled

Semantic Wave 2008: Industry Roadmap to Web 3.0 and Multibillion Dollar Market Opportunities. This report charts the evolution of the internet from Web 2.0 to Web 3.0, the emergence of semantic technologies for con­sumer and enterprise applications, and the growth of multi-billion dollar markets for Web 3.0 products and services. It is must reading for investors, technology developers, and enterprises in the public and private sector who want to better understand semantic tech­nologies, the business opportunities they present, and the ways Web 3.0 will change how we use and experi­ence the internet for pleasure and profit.

Enjoy this free summary of Project10X’s Semantic Wave 2008 Report, and be sure to…Order your copy of the Semantic Wave 2008 Report. See ordering information on page 27!

Mills Davis Washington, DC USA

Executive Summary



What is the semantic wave?

A tidal wave of four Internet growth stages.

The semantic wave embraces four stages of in­ternet growth. The first stage, Web 1.0, was about connecting information and getting on the net. Web 2.0 is about connecting people — putting the “I” in user interface, and the “we” into Webs of social participation. The next stage, Web 3.0, is starting now. It is about representing meanings, connecting knowledge, and putting these to work in ways that make our experience of internet more relevant, useful, and enjoyable. Web 4.0 will come later. It is about connecting intelligences in a ubiq­uitous Web where both people and things reason and communicate together.


Project10X’s Semantic Wave 2008 Report tells the


story of Web 3.0. Over the next decade, Web 3.0


will spawn multi-billion dollar technology markets that will drive trillion dollar global economic ex­pansions to transform industries as well as our experience of the internet. The Semantic Wave 2008 report examines drivers and market forces for adoption of semantic technologies in Web 3.0 and maps opportunities for investors, technology developers, and public and private enterprises.


What is the Evolution of the Internet to 2020?



How is Web 3.0 different from previous stages of internet evolution?

Knowledge computing drives new value creation and solves problems of scale and complexity.

The basic shift occurring in Web 3.0 is from infor­mation-centric to knowledge-centric patterns of computing. Web 3.0 will enable people and ma­chines to connect, evolve, share, and use knowl­edge on an unprecedented scale and in new ways that make our experience of the internet better.

Web growth continues to accelerate. Dimensions of net expansion include communications band­width, numbers of people connected, numbers and kinds of devices that are IP-aware, numbers of systems and applications, quantities of informa­tion, and types of media. As the internet expands, needs world-wide are outstripping the capacities and capabilities of current information and com-


Web 3.0 — The Internet Grows a Knowledge Plane


munications technologies (ICT) and architectures.


Information-centric patterns of computing have


reached the limit of what they can provide to cope with problems of scale, complexity, security, mobil­ity, rich media interaction, and autonomic behavior.

Web 3.0 will solve these problems and lay a foun­dation for the coming ubiquitous Web of connect­ed intelligences. The Web 3.0 solution, simply put, is to give the internet a knowledge space. In the following topics we identify key characteristics of this knowledge space, sketch out how its seman­tic computing works, and examine how Web 3.0 knowledge-centric patterns of computing drive new value creation.


What semantic technologies will power Web 3.0?

Digital tools that represent and reason about meanings, theories, and know-how separately from documents, data, and program code.

The key notion of semantic technology is to rep­resent meanings and knowledge (e.g., knowledge of something, knowledge about something, and knowledge how to do something, etc.) separately from content or behavior artifacts, in a digital form that both people and machines can access and interpret. As a platform, Web 3.0 will embrace all semantic technologies and open standards that can be applied on top of the current Web. It is not restricted just to current Semantic Web standards.

Web 3.0 will encompass a broad range of knowl­edge representation and reasoning capabilities including pattern detection, deep linguistics, on­tology and model based inferencing, analogy and reasoning with uncertainties, conflicts, causality,


and values. The figure below depicts a spectrum of progressively more capable forms of knowl­edge representation that spans tag collections (or folksonomies); to dictionaries, taxonomies and thesauri; to schemas and conceptual models; to


ontologies and theory-based logics, to axiologies


(value-based reasoning), and entirely new uses


barely tapped. Reasoning requires knowledge representation. We choose more powerful forms of representation to enable more powerful kinds of reasoning and problem solving. The integra­tion of social Web and semantic technologies in Web 3.0 allows new synergy that lowers the cost of data and knowledge creation, and raises the computational value of gathering it.

From Searching to Knowing — Spectrum of Knowledge Representation and Reasoning Capabilities

Strong Semantics

Weak Semantics


How will Web 3.0 systems connect data, services and applications?

First, they’ll integrate knowledge about these applications, content sources, and process flows. Then they’ll execute it.

In order to connect systems, integrate information, and make processes interoperable, the first step is to integrate the knowledge about these sys­tems, content sources, and process flows. Today, people do this offline, manually. This approach does not scale. In Web 3.0 both people and appli­cations will connect knowledge in real time using automated and semi-automated methods. Web

3.0 approaches will scale.

Semantically modeled, machine executable knowledge lets us connect information about people, events, locations, times — in fact, any concept that we want to — across different con­tent sources and application processes. Instead of disparate data and applications on the Web, we get a Web of interrelated data and interoperable applications. Recombinant knowledge is repre­sented as concepts, relationships and theories that are sharable and language neutral. Semantic technologies provide the means to unlock knowl­edge from localized environments, data stores, and proprietary formats so that resources can be readily accessed, shared, and combined across the Web.

In today’s Web, each device has an operating system (OS) that provides walled access to its content through a hierarchical file system. Limita­tions of OS platforms are spurring development of semantic desktops to provide meaning-based, concept-level search, navigation, and integration across varied content sources and applications found on PCs and other devices.

Applications running on OS platforms provide access to the information they have knowledge of, but do not combine easily with others, unless such link-ups have been planned and agreed to in advance by developers. The need to overcome

these limitations of OS platforms including the need for human labor to research and code inter­faces is fueling interest in:

Web-tops — platforms spanning multiple OSs connected over the internet,

Mash-ups — two or more data sources or works combined to become a new data source or work,

Context-aware mobility — dynamic compo­sition and personalization of services across devices, networks, locations, and user cir­cumstances, and

Semantic service-oriented architectures — using machine-interpretable descriptions of policies and services to automate discovery, negotiation, adaptation, composition, invoca­tion, and monitoring of Web services.

In Web 3.0, these sorts of capabilities will become intrinsic features of the knowledge space’s se­mantic fabric, and no longer mere one-off hacks or the result of mutually exclusive platform and service plays.

Executive Summary


Where do the shared meanings and knowledge in Web 3.0 come from?

From both people and machines. And, to start with, from the Web itself.

Knowledge exists in many forms in todays Web. All computing processes represent some type of knowledge in some way in order to process infor­mation, for example: knowledge about how infor­mation is organized in order to search it; rules that tell a computer program how to make a decision; or action steps to take to complete a task.

The problem is that existing knowledge on the Web is fragmented and difficult to connect. It is locked in data silos and operating system file system formats. Knowledge is hidden in object-oriented black boxes and layers of stack architecture. It is embedded in program code and squirreled away in proprietary algorithms.

Web 3.0 changes this. The convergence of pat­tern discovery, deep linguistics, and ontological symbolic reasoning technologies make it feasible to automatically extract embedded and intrinsic knowledge from todays Web. Evolution of seman-


tic social computing will enable communities to


create, curate, and share knowledge in human


readable and machine executable forms.

The diagram below contrasts knowledge-centric and information-centric patterns of computing. In Web 3.0, end-user development will increase as computers help generate intelligent services and manage application functionality, security, ver­sioning and changes autonomically.


What Are Knowledge-centric Patterns of Computing?

Pattern Information-centric        Knowledge-centric

Who develops software Producers and enterprises are developers. Prosumers (consumers) and peer-to-peer producers (groups, behaviors, knowledge communities) do it themselves. structures, and content?

How are different Separate technologies for documents (data, Unified platforms handle documents, models & behaviors expressions of knowledge content), models, and behaviors. Closed seman-interchangeably, including pictures & natural language. Massive handled? tics, hardwired. open local semantics, available everywhere.

Where do knowledge & logic At design time, from people.  At new release, At design time, from people. in the system come from? from people. No run-time learning. At run time, from user input and from system learning.

What are the patterns for No system learning. System learns and evolves from use by people. system learning? No autonomics. Machine observes & learns from environment. New knowledge requires new version of code. Autonomics — self* learning and adaptation.

What are the patterns for Process-centric, cycle time intensive. Direc-Data-centric, storage-intensive. Semantic operators. Sequence knowledge representation tional algorithms and procedures. Embedded neutral graph reasoning. External declarative knowledge struc­and computation? knowledge — logic, structure locked in code. tures. Semantic and value-based reasoning with full spectrum

Relational operators. First-order logic. of logic.

What are the patterns for              Predefined configurations. Adaptive, self-optimizing configurations. Ubiquitous semantic

underlying infrastructure?             Black-box objects. Webs, meshes & grids. Transparent semantic agents. Multi-core, Stacks. multi-threaded processors. Federated stores and processes. Single processors. Semantic ecosystems and social autopoeisis (self-organization, Local stores. planning, etc.).

What are the patterns for Separate role-based security for each system. Autonomic identity and security with concept level granularity security? Black boxes, lack of transparency, and human across all IP entities, relationships, services, etc. Building block intervention make network security problematic. transparency = security by design.

What are patterns for Manual change management and versioning. Automated change management & versioning. Autonomic intel­versioning and change Human architected. Central planning. Brittle. lectual property, emergent behaviors, self-managed. Robust. management?


What new capabilities will Web 3.0 knowledge-centric computing enable?

Systems that know, learn, and can reason as humans do.

When knowledge is encoded in a semantic form, it becomes transparent and accessible at any time to a variety of reasoning engines.

Previously, if knowledge was in a document or set of documents, then it was fixed when published in a form only humans could read. Or, if knowledge was encoded in a computer program, then it was opaque and hidden in objects or in procedures that were fixed at design time, and hence a “black box” so that the logic is not visible to any other process that had not been pre-programmed with common knowledge.

In Web 3.0, knowledge lives, evolves and is stored transparently (as “glass boxes”). It can be used, validated, added to, combined with other knowl­edge at run time by multiple systems. This enables a system to “learn” to do things that the system designer did not anticipate. This is an important shift from IT as it has been practiced until now.

Web 3.0 systems will be designed so that they get better with use and scale. Their architectures will enable learning. One way is that their users can evolve them by adding knowledge and capa­bilities to them. Another way is that systems may learn by themselves how to respond to changes in their environments.

2007, 2008 Copyright MILLS•DAVIS. All rights reserved

Report Summary & Prospectus


How will Web 3.0 overcome the fragmentation of information, processes, and application functionality?

By interrelating the myriad forms of language that people and machines used to encode thoughts, share meanings, and connect knowledge.

Until now, knowledge on the Web has been ex­pressed in separate forms such as documents, imagery, patterns, structural models, and pro­gram code. Computers that produced these arti­facts mostly have been used as electronic pencils, with little (if any) understanding of what the writing meant, and no ability to interpret other ways of expressing the same idea (such as through graph­ics, images, video, computer languages, formal languages, and other natural languages, etc.).

In Web 3.0, the myriad forms of language in which knowledge is expressed begin to get interrelated, connected, and made interchangeable with each other, for example: combining knowledge from one or more sources, or from one or more formats, or from one time and place with other contexts.

To illustrate, policies are typically written out as documents. But, this same knowledge can be modeled as a data structure or as decision rules. Also, policies can be hard coded into software objects and procedures. Using semantic technol­ogies we can represent, connect, and manage the

knowledge from all of these different forms at the level of concepts, and maintain each artifact. This sort of “transemantic” or multi-lingual capability leads to computer systems that can:

Capture knowledge from different sources such as sensors, documents, pictures, graph­ics, and other data and knowledge resources,

Interpret and interrelate these different ways of expressing ideas with each other,

Share what they know with people and ma­chines, and

Re-express, and communicate what they know in different contexts, information for­mats, and media.


How Do Humans Encode Thoughts and Share Knowledge and Meaning?

Executive Summary


Cuadro de texto: Natural language 	Documents, speech, stories  Visual language 	Tables, graphics, charts, maps, illustrations, images  Formal language 	Models, schema, logic, mathematics, professional and scientific  	notations  Behavior language 	Software code, declarative specifications, functions, algorithms  Sensory language 	User experience, human-computer interface

Source: Project10X

How does Web 3.0 tap new sources of value?

By modeling knowledge, adding intelligence, and enabling learning.

The value drivers for Web 3.0 are huge. The table below highlights five categories of challenges, se­mantic capabilities that address these needs, and the value drivers associated with these semantic capabilities. Semantic technologies have the po­tential to drive 2-3 order of magnitude improve­ments in capabilities and life cycle economics through cost reductions, improved efficiencies, gains in effectiveness, and new functionalities that were not possible or economically feasible before now.  New sources of value include:

1. Value from knowledge modeling — Semantic models are sharable, recombinant, and execut­able. To model first, then execute the knowledge reduces time, risk, and cost to develop and evolve services and capabilities. Semantic model-based approaches achieve added development econo­mies through use of (a) shared knowledge models

as building blocks, (b) autonomic software tech­niques (goal-oriented software with self-diagnos­tic and self-management capabilities such as self-configuration, self-adaptation, self-optimization, etc.), and (c) end-user and do-it-yourself life-cycle development methodologies (rather than requiring


intervention by IT professionals). Knowledge that


is sharable, revisable, and executable is key ap­


plications where facts, concepts, circumstances, and context are changing and dynamic.

2. Value from adding intelligence — A working definition of intelligence is the ability to acquire, through experience, knowledge and models of the world (including other entities and self), and use them productively to solve novel problems and deal successfully with unanticipated circum­stances. A key new source of value is adding in­telligence to the user interface, to applications,


How Do Semantic Technologies Drive Value?

Cuadro de texto: Challenges 	Semantic Capabilities 	Value Drivers  1. Development: Complexity, labor-intensity, solution time, cost, risk 	Semantic automation of “business need-to¬capability-to-simulate-to-test-to-deploy-to¬execute” development paradigm 	Semantic modeling is business rather than IT centric, flexible, less resource intense, and handles complex development faster.  2. Infrastructure: Net-centricity, scalability; resource, device, system, information source, communication intensity 	Semantic enablement and orchestration of transport, storage, and computing resources;  IPv6, SOA, WS, BPM, EAI, EII, Grid, P2P, security, mobility, system-of-systems 	In the semantic wave, infrastructure scale, complexity, and security become unmanageable without semantic solutions.  3. Information: Semantic interoperability ofinformation formats, sources, processes, and standards; search relevance, use context 	Composite applications (information & applications in context powered by semantic models), semantic search, semantic collaboration, semantic portals 	Semantic interoperability, semantic search, semantic social computing, and composite applications & collaborative knowledge management become “killer apps.”  4. Knowledge: Knowledge automation, complex reasoning, knowledge commerce 	Executable domain knowlege-enabled authoring, research, simulation, science, design, logistics, engineering, virtual manufacturing, policy and decision support 	Executable knowledge assets enable new concepts of operation, super-productive knowledge work, enterprise knowledge superiority, and new intellectual property.  5. Behavior: Systems that know what they’re doing 	Robust adaptive, autonomic, autonomoussystem behaviors, cognitive agents, robots, games, devices, and systems that know,  learn, and reason as humans do 	Semantic wave systems learn and reason as humans do, using large knowledgebases, and reasoning with uncertainty and values as well as logic.

and to infrastructure. An intelligent system or agent is a software program that learns, cooper­ates, and acts autonomously. It is autonomic and capable of flexible, purposeful reasoning action in pursuit of one of more goals. An intelligent user interface (UI) knows about a variety of things such as system functionality, tasks users might want to do, ways information might be presented or pro­visioned. Intelligent UIs know about the user (via user models), which enables tailoring system be­havior and communications. Adding intelligence helps users perform tasks, while making working with the computer more helpful, and as invisible as possible. As a result, systems do more for the user, yield more relevant results with less effort, provide more helpful information and interaction, and deliver a more enjoyable user experience. Adding intelligence can produce ten-fold gains in communication effectiveness, service delivery, user productivity, and user satisfaction.


3. Value from learning — Machine learning is the ability of computers to acquire new knowledge from past cases, experience, exploration, and user input. Systems that learn increase in value during their lifetime. Their performance improves. They get better with use, and with scale. In addition to new or improved capabilities, systems that learn during operation may improve system life cycle economics by (a) requiring less frequent upgrad­ing or replacement of core software components, and (b) enabling new incremental extensions to revenue models through add-on knowledgeware and software-as-a-service.   


4. Value from semantic ecosystem — An ecosys­tem is a self-sustaining system whose members benefit from each other’s participation via sym­biotic relationships (positive sum relationships). Principle drivers for semantic infrastructure and ecosystem include the economics of mobility, scale, complexity, security, interoperability, and dynamic change across networks, systems, and information sources. These problems are intrac­


table at Web scale without semantics. The corol­lary is the need to minimize human labor needed to build, configure, and maintain ultra-scale, dy­namic infrastructure.

Semantic ecosystems that emerge in Web 3.0 will consist of dynamic, evolve-able systems consist­ing of ensembles (societies) of smart artifacts. This means a shift in design focus from static, perfor-


mance-driven design to: (a) design for robustness


& resilience; (b) design for uncertainties; (c) design


for distributed, autonomous pervasive adaptation;

(d) design for organically growing systems; and (e) design for creating self-evolving services.

Current systems including the internet are de­signed to operate with predefined parameters. Change spells trouble. Mobility is a problem. Se­mantic ecosystems will be future-proof, able to grow dynamically, evolve, adapt, self-organize, and self-protect. Web 3.0 will lay the foundations for ubiquitous Web including autonomic intel­lectual property, Web-scale security and iden­tity management, and global micro-commerce in knowledge-based assets. The value vector for semantic infrastructure is 2-4 orders of magnitude gains in capability, performance, and life cycle economics at Web scale.



Semantic Wave Technology Trends

The Semantic Wave 2008 Report examines over 100 application categories & more than 270 companies pursuing semantic products and services.


A broad range of semantic technologies will power Web 3.0. The technology section of Project10X’s Semantic Wave 2008 Report examines Web 3.0 technology themes from multiple perspectives. It shows how innovations in each area will drive development of new categories of products, ser-I E


vices, and solution capabilities. Technology per-t t e


spectives include:

rr ii oo

Semantic user experience — concerns how I rr experience things, demands on my attention, my personal values.

Semantic social computing — concerns our lived culture, intersubjective shared values, & how we collaborate and communicate.


Semantic Technology Perspectives

Semantic applications, and things — con­cerns objective things such as product struc­ture & behavior viewed empirically.

Semantic infrastructure — concerns interobjec­tive network-centric systems and ecosystems.

Semantic development — concerns Webs of meanings, systems that know and can share what they know, and architectures of learning, which make semantic solutions different.

Semantic Wave 2008 spotlights trends in each of these areas and examines role of semantic tech­nologies in over 100 application categories. An ad­dendum to the report surveys more than 270 com­panies that are currently researching and developing semantic wave technology products and services.

Executive Summary 12

Technology Trend 1—Semantic User Experience

Intelligent user interfaces drive gains in user productivity & satisfaction.

The Semantic Wave 2008 Report explores the impact of semantic technologies on user experi­ence. User experience is the sum of interactions and overall satisfaction that a person has when using a product or system. Semantic user expe­rience is the addition intelligence and context-awareness to make the user interface more adap­tive, dynamic, advisory, proactive, autonomic, and autonomous, and the resulting experience easier, more useful, and more enjoyable.

Attention is the limited resource on the internet — not disk capacity, processor speed or bandwidth. Values shape user experience. Simplicity, versa­tility and pleasurability are the new watchwords. Context is king. Mobility, wireless, and video are the new desktop. Seamless services anytime, any where. Users are prosumers, creating content, participating in peer production, taking control of consumption. Trends in user interface (UI) are towards personal avatars; context-aware, immer­sive 3D interaction; and reality browsing, and aug­mented reality. 

Identity is information used to prove the individu­ality of a person as a persisting entity. The trend is towards semantic avatars that enable individuals to manage and control their personal information, where ever it is across the net. Context is infor­mation that characterizes a situation of an entity,

person, object, event, etc. Context-awareness is using this knowledge to sense, predict, interpret, and respond to a situation.

Web 3.0 browsers will understand semantics of data, will broker information, and automatically in­terpret metadata. The emerging display landscape (depicted above) will be semantically connected and contextually aware. It will unify displaying and interacting, and will personalize experience. Re­ality browsing is querying the physical world live and up close from anywhere. Augmented reality is bringing the power of the Web to the point of deci­sion, by combining real world and computer gen­erated data. Semantic rich internet applications will exploit higher bandwidth content dimensionality, context sensitivity, and expanded reasoning power for dynamic visualization and interaction in the UI.

2007, 2008 Copyright MILLS•DAVIS. All rights reserved

Report Summary & Prospectus 13



Technology Trend 2 — Semantic Social Computing

Collective knowledge systems become the next “killer app.”

The Semantic Wave 2008 Report explores the role of semantic technologies in the evolution of so­cial computing. Social computing is software and services that support group interaction. Semantic social computing adds an underlying knowledge representation to data, processes, services, and software functionality.

Semantic technologies will enrich many catego­ries social applications including instant messag­ing, e-mail, bookmarking, blogging, social net­working, wikis, user driven “communitainment”, and do-it-yourself applications and services. For example, semantic technologies will enable social computing applications to provide concept-based rather than language-based search and naviga­tion across most standard applications, docu­ment types, and file formats, regardless where these resources reside on the net, be it a desktop, mobile device or server, etc.

A key trend in Web 3.0 is toward collective knowl­edge systems where users collaborate to add content, semantics, models, and behaviors, and where systems learn and get better with use. Col­lective knowledge systems combine the strengths of social Web participation with semantic Web

integration of structure from many sources. Key

Report Summary &

features of Web 3.0 social computing environ-


ments include (a) user generated content, (b) hu­


man-machine synergy; (c) increasing returns with scale; and (d) emergent knowledge. Incorporating new knowledge as the system runs is what en­ables Web 3.0 systems to get smarter.

Technology Trend 3 — Semantic Applications

New capabilities, concepts of operation, & improved life cycle economics.

The Semantic Wave 2008 Report examines the emerging role of semantic technologies in more than 100 consumer and enterprise application categories. Semantic applications put knowledge to work. Areas covered in the report include: (a) semantics in commercial off the shelf software such as ERP, CRM, SCM, PLM, and HR; (b) on-tology-driven discovery in law, medicine, science, defense, intelligence, research, investigation, and real-time document analysis; (c) risk, compliance and policy-driven processes such as situation as­sessment, exceptions, fraud, case management, and emergency response; (d) knowledge-intensive processes such as modeling & simulation, acquisi­tion, design, engineering, and virtual manufactur­ing; (e) network & process management such as diagnostics, logistics, planning, scheduling, secu­rity, and event-driven processes; (f) adaptive, auto­nomic, & autonomous processes such as robotics and intelligent systems; and (g) systems that know, learn & reason as people do such as e-learning, tu­tors, advisors, cognitive agents, and games

Key trends toward semantic applications are:

From knowledge in paper documents, to digi­tal documents, to knowledge (semantic mod­els), to semantic agents;


From static and passive functional processes,


to active, adaptive, and dynamic processes,


to autonomic to autonomous processes;

From programmer encoded interpretations of meaning and logic at design time, to com­puter interpretation of meaning and logic at run time;

From smart program code to smart data;

From search to knowing; and

From reasoning with SQL to first order logic, to complex reasoning with uncertainty, con­flict, causality, and values for the purposes of discovery, analysis, design, simulation, and decision-making.

2007, 2008 Copyright MILLS•DAVIS. All rights reserved



Technology Trend 4 —  Semantic Infrastructure

A knowledge space solves problems of scale, complexity and security.

Cuadro de texto: The Semantic Wave 2008 Report examines the 	Transport moves from dial-up, to broad band,  role of semantic technologies in infrastructure. In¬	to video bandwidth. Mobility is the new plat¬ frastructure is the basic features of a system such 	form, and semantic technologies are needed  as networks, facilities, services, and installations 	to deliver seamless, customizable, context  that are needed for the functioning of internet-	aware services, any time, any where.  based communities. By adding a knowledge di¬	 mension to this underlying structure, semantic 	Processor technology goes parallel, multi- infrastructures provide solutions to problems of 	core, multi-threaded, and specialized.  Integration, interoperability, parallelism, mobility, 	 ubiquity/pervasiveness, scale, complexity, speed, 	Displays become a landscape of interoper¬ power, cost, performance, autonomics, automa¬	able devices of differing characteristics, sizes  tion, intelligence, identity, security, ease of pro-	and capabilities. Boundaries between virtual  gramming, and ease of use. 	and real dissolve in planned and unplanned  	ways. The trend is towards immersive experi- Information and communications technology (ICT) 	ence and reality browsing.  has reached the limits of what it can do with stack 	 architecture, object orientation, first-order logic, 	Longer term, the trend is towards every thing  and fixed, embedded knowledge (i.e., in code) 	becoming connected, somewhat intelligent,  with no learning, or with architected development 	somewhat self-aware, socially autopoeitic, and  versus emergent solutions. Semantic technolo¬	autonomically capable of solving problems of  gies provide the first path forward to overcome 	complexity, scale, security, trust, and change  the limitations of these existing approaches. 	management.  Trends toward semantic infrastructure include: 	 Computing diverges into declarative (brain) 	 and procedural (sensory organs) lines of de¬	 velopment. 	 Storage moves from flat files, to centralized 	 “bases” with relational operators, to federated 	 “spaces” with native semantic operators. The 	 trend is toward high-performance semantic 	 processing at scale and representations that 	 support nearly unlimited forms of reasoning.

Report Summary & Prospectus 16

Technology Trend 5 —  Semantic Development

Semantic modeling reduces time, risk, and cost to develop solutions.

The Semantic Wave 2008 Report explores trends in methodology and practices for semantic soft­ware and solution development.

A development life cycle is a conceptual model used in project management that describes the stages involved in a system or application devel­opment project, typically involving a discovery, feasibility, and planning stage through mainte­nance of the completed application. Conventional development methodologies include the waterfall model; rapid application development (RAD); the fountain model; the spiral model; build and fix; and synchronize-and-stabilize, etc.

Semantic solution development departs from conventional development. It deals with: (a) Webs of meanings and knowledge from diverse prove­nance, (b) systems that know and can share what they know, and (c) architectures of learning.

Semantic solutions emerge from a techno-social collaboration that also supports do-it-yourself development. The process is business and user driven versus IT and developer driven. The col­lective knowledge developed is both human and machine interpretable. Some different skills are re­quired including domain experts, semantic mod­elers, and semantic user experience designers.

Knowledge is extracted and modeled separately from documents, schemas, or program code so it can be managed across these different forms of expression, shared between applications, aligned and harmonized across boundaries, and evolved. For example, requirements, policies, and solution patterns are expressed as semantic models that execute directly and can be updated with new knowledge as the application runs.

The semantic solution development process is model-driven and knowledge-centric and rather than procedural and document based. Semantic solutions may have zero code. Build cycles are fast, iterative, non-invasive. Semantic solution development typically entails less time, cost, and risk to deploy, maintain, and upgrade.

2007, 2008 Copyright MILLS•DAVIS. All rights reserved

Executive Summary 17



Semantic Wave Markets

The Semantic Wave 2008 Report sizes markets and presents 150 case studies in 14 horizontal and

vertical market sectors.

The market section of Project10X’s Semantic Wave 2008 Report examines the growth of supply and demand for products, services and solutions based on semantic technologies. Specifically, the report segments and discusses semantic wave markets from horizontal and vertical perspec­tives:

Horizontal market sectors include: Research and development; Information and communi­cation technologies; Consumer internet; and Enterprise horizontal.

Vertical market sectors include: Advertising, content, entertainment; Defense, intelligence, security; Civilian agencies, state & local gov­ernment; Education, training; Energy, utilities; Financial services; Health, medical, pharma, life sciences; Information & communications tech­nology; Manufacturing; Professional services; Transportation, logistics; and other services.

Horizontal and vertical market sectors each pres­ent multi-billion dollar opportunities in the near- to mid-term. The study sizes markets. It presents 150 case studies in 14 horizontal and vertical sectors that illustrate the scope of current market adoption.

Semantic technologies are spreading out and pen­etrating into all areas of information and commu­nications technology, all economic sectors, and most categories of application. There are power­ful economic drivers. Development and adoption is already global in scope. Market momentum is building. The sweet spot for cross-over and mar­ket acceleration is only about a year out.


Source: Project10X


What Are Semantic Wave Markets?

Executive Summary 18


Market Trend 1—Research & Development

Semantic technologies are a significant and growing focus in global R&D.


The maturation of R&D investments made in the public and private sectors over the past decade is one reasons why semantic technologies and Web

3.0 are entering mainstream markets. The dia­gram below highlights semantic technology areas which are receiving international R&D funding es­timated to be more than $2B per year through the end of the decade.

Public sector investment has been significant and is growing in North America, Europe, and Asia. Countries recognize the strategic impor­tance of semantic technologies in the emerging global knowledge economy and are seeking com­petitive advantage through public sector invest­ments. Historically, it is worth noting that public sector investment to develop ICT technologies has a strong track record, having spawned $-bil­lion industries repeatedly over the past 40 years.

The Semantic Wave 2008 Report provides sum­marized examples of public sector R&D programs from organizations such as: DARPA, Air Force Re­search Laboratories, NASA, and NSF.

Private sector firms accelerate semantic technol-


ogy R&D. Commercial investment is now global.


Private sector motivations for R&D are nearer­


term and focus on return on investment. Semantic Wave 2008 predicts that both consumer-internet and enterprise-oriented investments in semantic technology will increase significantly through the end of the decade.


What Are Semantic R&D Trends?

Market Trend 2—Information & Communication Technology

ICT semantic technology markets will exceed $10 billion in 2010.

The global ICT market is $3.5 Trillion and will be $4.3 Trillion by 2010. Growth in the E7 countries (China, Brazil, Korea, India, Russia, Mexico, and Taiwan) is currently around 20-percent per year. The market for semantic technologies is currently a tiny fraction of global ICT spending. But, growth is accelerating.


Who Are the Semantic Technology Suppliers?

Semantic Wave 2008 profiles more than 270 com­panies that provide semantic technology R&D, services and products. Most are small, boutique firms, or start-ups. But, a significant number of established ICT companies have entered the se­mantic space. Overall, we estimate markets for semantic technologies in ICT will exceed $10 B in 2010.

Executive Summary


Cuadro de texto: 7 Degrees 	Collexis 	Franz Inc. 	Kalido 	Mind-Alliance 	Readware 	Sun Microsystems  42 Objects 	Composite Software 	Fujitsu Laboratories 	Kapow Technologies 	Systems 	Rearden Commerce 	SunGard  Above All Software 	Computas AS 	General Dynamics IT 	Kennen Technologies 	Mindful Data 	Recommind 	Sybase  Abrevity 	Computer Associates 	Generate 	Kirix 	Miosoft 	Red Hat 	Synomos  Access Innovations 	Connotate 	GeoReference Online 	Knewco 	Modulant 	Reengineering 	SYS Technologies  Active Navigation 	Content Analyst 	Gist 	Knova Software 	Modus Operandi 	Reinvent Technology 	System One  Adaptive Blue 	Contextware 	Global 360 	Knowledge Based 	Molecular 	Revelytix 	TACIT  Adobe Systems 	Contivo 	Google 	Sys. 	Mondeca 	RuleBurst 	Talis  Aduna 	Convera 	Graphisoft 	Knowledge 	Moresophy 	SAIC 	Taxonomy Strategies  Agent Logic 	Copernic 	Groxis 	Computing 	Motorola Labs 	SaltLux 	TÉMIS Group  Agent Software 	Correlate 	Gruppometa 	Knowledge Concepts 	mSpace 	Sandpiper Software 	Teradata  Agilense 	Cougaar Software 	H5 	Knowledge 	Mulgara.org 	SAP 	Teragram  Alitora 	Coveo 	hakia 	Foundations 	Nervana 	SAS Institute 	TextDigger  Altova 	Crystal Semantics 	HBS Consulting 	Knowledge Media 	Netbreeze 	SchemaLogic 	Textual Analytics  Amblit Technologies 	CureHunter 	Hewlett-Packard 	Inst. 	Netezza 	seekda 	Textwise  Apelon 	Cycorp 	Hypertable 	Knowledge Systems, AI 	NetMap Analytics 	Semandex Networks 	The Brain  Arisem 	Dapper 	i2 	Kosmix 	NeOn 	Semansys 	Technologies  Articulate Software 	Dassault Systemes 	IAC Search & Media 	Kroll Ontrack 	NeurokSoft 	Technologies 	The METADATA Co.  AskMe 	Data-Grid 	iCognue 	Kyield 	NextIT 	Selmantech, Inc. 	Thetus  AskMeNow 	Day Software 	IBM 	Language&Computing 	Nielsen BuzzMetrics 	Semantic Arts 	Thinkmap  Aspasia 	Deepa Mehta 	ILOG 	Language Computer 	Noetix 	Semantic Discovery 	Thomson Reuters  Astoria Software 	Design Power 	Image Matters 	Corp. (LCC) 	Nokia 	Semantic Edge 	ThoughtExpress  AT&T Research 	DERI 	Imindi 	LEGO Americas 	Northrop Grumman 	Semantic Insights 	TopQuadrant  ATG 	Design Power 	iMorph 	Leximancer 	Novamente 	Semantic IQ 	TripIt  Attensity 	DFKI 	Infolution 	Lexxe 	nStein 	Semantic Knowledge 	Troux Technologies  Autonomy 	DiCom Group 	Inform Technologies 	Liminal Systems 	Numenta 	Semantic Light 	True Knowledge  Axontologic 	Digital Harbor 	Informatica 	Linguamatics 	NuTech Solutions 	Semantic Research 	True Thinker  BAE Systems 	Digital Reasoning Sys. 	Information Extraction 	Linguistic Agents 	Ontology Online 	Semantic Search 	Ultimus  BBN Technologies 	Discovery Machine 	Sys. 	LinkSpace 	Ontology Works 	Semantic Solutions 	Ultralingua  Be Informed 	DreamFactory 	InforSense 	Lockheed Martin 	Ontomantics 	Semantic System 	Versatile Info Systems  BEA Systems 	Software  	InfoSys 	LogicLibrary 	ontoprise 	SemanitiNet 	Vignette  Biowisdom 	EasyAsk 	Innodata Isogen 	Lymba Corporation 	Ontos 	Semantra 	Vitria  Boeing Phantom 	Effective Soft 	InnoRaise 	Magenta Technology 	OntoSolutions 	Semaview 	Vivísimo  Works 	Ektron 	Intellidimension 	Makna Semantic Wiki 	OpenLInk Software 	SemperWiki 	Vivomind Intelligence  Bouvet 	EMC Corporation 	Intelligent Automation 	Mandriva 	Open Text 	SenseBot 	WAND  Bravo Solution 	Empolis 	Intellisemantic 	Mark Logic 	Oracle 	SERENA Software 	WebLayers  Business Semantics 	ENDECA 	Intellisophic 	Match4J 	PeoplePad 	SiberLogic 	Whatever  Cambridge Semantics 	Enigmatec 	Interwoven 	MatchMine 	PhraseTrain 	Siderean Software 	WiredReach  Celcorp 	Enterra Solutions 	Invention Machine 	Matrixware 	Pier39 	Sierra Nevada Corp 	Wordmap  Celtx 	Entrieva 	Iona Technologies 	McDonald Bradley 	Polymeta 	SilkRoad Technology 	XSB  Centriguge Systems 	Epistemics 	iQser 	Mendix 	Powerset 	Silobreaker 	Yahoo!  CheckMI 	Evri 	Irion Technologies 	MetaCarta 	Pragatic 	Sirma Group–Ontotext 	Zepheira  Circos 	Exalead 	Iron Mountain 	MetaDolce 	Profium 	Smart Desktop 	ZoomInfo  Cisco Systems 	Expert System 	iSOCO 	MetaIntegration 	Progress Software 	Smart Info-System 	Zotero  Clarabridge 	ExpertMaker 	ISYS Search Software 	Metallect 	Project10X 	SmartLogic 	ZyLAB  CognIT a.s 	Factiva 	Janya 	Metatomix 	Proximic 	Soar Technology 	 Cognition 	Fair Isaac 	JARG Corporation 	Metaview 360 	PTC 	Software AG 	 Technologies 	Fast Search & Transfer 	Jiglu 	MetaWeb 	Qitera 	Sony 	 Cognium Systems 	Fortent 	Joost 	Technologies 	Quigo 	Spock 	 Cohereweb 	Fortius One 	JustSystems 	Métier 	Radar Networks 	SRA International 	 Collarity 	FourthCodex 	K2 	Microsoft Corporation 	Raytheon 	SRI International 	Source: Project10X

Market Trend 3—Consumer Internet

Consumer content, entertainment & advertising dollars will build Web 3.0.

Semantic Wave 2008 examines the growth of in­ternet and mobile advertising, content and enter­tainment to 2012, as well as the growth of inter­net based commerce and consumer electronics. Driving forces in consumer internet markets are huge. The increasing the flow of advertising dol­lars, content, entertainment, and commerce to the internet will fuel the build-out of Web 3.0.

Consumers account for 25 percent of global ICT spending. One billion people around the globe now have access to the internet. Nearly 50 per­cent of all U.S. Internet access is via always-on broadband connections. Mobile devices outnum­ber desktop computers by a factor of two. By 2017, telecom industry projects are that there will be 5 billion mobile internet users and more than 7 billion internet-enabled devices. The internet is now a mass medium for content, entertainment,

and advertising as well as knowledge exchange and business efficiency. It is growing rapidly and it is taking market share (i.e., money) away from other media.

Semantic technologies are strategic for consumer


internet markets as enabling technology, and as a


means for competitive differentiation. We’re in the


midst of a “user revolution.” Context, social nets, & relationships are king. Consumers are prosumers. They create content, engage in peer production, participate in usites (sites with user created content), enjoy themselves with communitainment (users si­multaneously communicating and participating in entertainment activities with each other as part of social networks. The “long tail” makes semantic ad­vertising into a killer app — a better way to target, match and bring together buyer and seller.

2007, 2008 Copyright MILLS•DAVIS. All rights reserved


Market Trend 4—Enterprise Horizontal

Middleware, services, processes, search, and collaboration go semantic.

As shown in the diagram below, just about ev­erywhere one can look in an enterprise, someone somewhere is applying semantic technologies to some problem. Drivers of enterprise business val­ue are all strong— new capability, life cycle ROI, performance, and strategic edge.

Semantic Wave 2008 examines enterprise mar­kets for semantic technologies including twelve categories of commercial-off-the-shelf software (COTS) packages that are estimated to represent a combined software product and service reve­nue of more than $160 billion in 2010. The report projects the transition of these market segments from conventional to semantic COTS technolo­gies. Amongst the first tier of large ICT technology providers, areas that are being targeted first are related to the internal stack, or plumbing for suites of applications because these changes make few

demands on customer while establishing a se­mantic application framework that developer can use as a foundation. Service oriented architecture becomes semantic SOA. Changes that impact application concept of operations, and user inter­face come next.


SummaryAlso, enterprise software has a long tail. There are


an estimated 56 million firms worldwide, includ­ing 1.5 million with more than 100 employees, and around 80,000 businesses with more than 1,000 employees. The transition to semantic software technologies will facilitate mass customization of commercial-off-the-shelf solutions enabling soft­ware vendors to address more levels of the mar­ket with sustainable solutions.


What Semantic Technologies Are Being Employed in Enterprise?

Risk management, regulatory  compliance, content integration, question answering,

Business Management

Defense, Intelligence, sense-making, data &

fraud detection, money laundering, reasoning, inference, anti-terrorism, security,

real-time auditing; crisis and emergency business intelligence; decison support

management: system, network outages; case management; business continuity

Mergers & acquisitions,  data & systems integration, enterprise architecture,,

ontology-driven information systems, semantic interoperability, semantic web services, virtual data center,

Customer service automation, customer PLM platform

self-service, personalized information on-demand, 360°-view of customer, field service operations, integrated CRM

Supply chain integration, design, sourcing optimization, integration & interoperation, CPFR




Input management, capture, classification, tagging, routing,




Output management, enterprise

data & content consolidation,

publishing platform , auto-generation

data cleaning

of content & media, auto-language versioning, cross-media, semantic portals

Discovery, aggregation, auto-classification, meta-search, federated query, smart search, intelligent domain research.

Dynamic planning, scheduling,, routing, optimization. Adaptive systems; Autonomic systems; Autonomous products/services

Design advisors, simulation-based acquisition; virtual manufacturing

Source: Project10X

Production Operations

2007, 2008 Copyright MILLS•DAVIS. All rights reserved

Market Trend 5—Industry Verticals

150 case studies make the case that semantic wave markets are here.

Semantic Wave 2008 examines semantic technol­ogy adoption in industry verticals. The report sum­marizes 150 case examples in fourteen industry sectors. The table below highlights some of the se­mantic application case examples in eight vertical industry sectors. Each industry has both horizontal

and vertical needs. Applications are diverse. Nearly three-fourths of the case examples come from pri­vate industry. A little more than one-fourth are public sector. Collectively, they make a strong case that se­mantic wave markets are here and now.


What Are Industry Vertical Markets for Semantic Technologies?

Executive Summary 23


ADVERTISING, CONTENT BBC; Bertelsmann; Dentsu; Disney;                    Digital asset management; rich media interoperability; content mining; mapping of concepts

& ENTERTAINMENT Elsevier; Associated Press; NZZ                across content libraries; accelerated creation of new derivative information products; identification and extraction of information types, such as chemical compounds and classes for science; rapid development of custom news feeds; skills curation and collaboration

EDUCATION Industry; universities; governments; E-learning; simulation learning tools (“learning by doing”); semantic collaboration environments; & TRAINING ETH Zurich digital library services; rapidly customized coursework content; automated scoring; publication streamlining

ENERGY & UTILITIES    Air Liquide America; Air Products; BP; Energy exploration; processing real-time remote sensor data; power distribution through GE Infrastructure Water & Process “common information models”; multi-agent technologies; corporate portals across departments Technologies; Shell Oil; Statoil and disciplines; adaptive data warehousing; multi-format document access; knowledge-based

service reporting; proposal management; integrating information across operating units; product and market segmentation; scenario validation

FINANCIAL SERVICES Citigroup; Ameriprise Financial; Aon; Risk and compliance management; due diligence; security and surveillance; analytical Fireman’s Fund Insurance, UBS, dashboards and composite applications; case management; auditing transparency; trend Credit-Suisse, Swiss Re, Munich Re, analysis; regulation and policy management; document and contract analysis; business rules Bank Vontobel for investment strategies; sales and customer service; risk scoring; new business acquisition; policy-based computing and application monitoring; loan processing; analyst productivity suites

HEALTH, MEDICINE, National Library of Medicine; Amgen; Meta-searching and clustering; enterprise search; scientific discovery; translational medicine; PHARMA & LIFE Biogen; Eli Lilly; GSK; Novartis; Pfizer; clinical knowledge bases; reasoning and decision support; healthcare supply chain planning; SCIENCES Healthline; Partners Clinical Informatics; in silico drug discovery; integrated biosurveillance; lexical standardization; market intelligence;

University of Texas; Mayo Clinic; patient records; drug development cost reduction ImpactRX; Cleveland Clinic; Astrazeneca

MANUFACTURING          Emerson Motors; General Dynamics; General Motors; BAE Systems; Rockwell Automation; Proctor & Gamble; EniTechnologies; Siemens

R&D; supply chain; customer support; product modeling; design and fabrication; design-

Source: Project10X

to-order; document life cycle management; virtual manufacturing; international market and scenario simulation and visualization; robotics and autonomous systems; speech recognition; automobile telematics and automation; quality improvement; enterprise knowledge management; inventory optimization; maintenance and repair management; competitive intelligence; intellectual capital management; portfolio management; customer self-service

TRANSPORTATION & Tankers International; SouthWest Airlines Cargo management; shipment tracking; contract review & management; logistics outsourcing LOGISTICS management; logistics cycle emulation; network routing and scheduling

PUBLIC SECTOR            DoD Finance & Accounting Service; GSA; National Communications System Continuity Communications Working Group; FAA; OMB; National Geospatial Intelligence Agency; National Institutes of Health; National Cancer Institute; National Center for Biomedical Ontology; Dept. of Health and Human Services; Defense Information Systems Agency; Defense Logistics Agency; U.S. Army; XVIII Airborne Corps; Dept. of Education; Internal Revenue Service; National Biological Information Infrastructure; NSA; CIA; DIA; Dept. of Homeland Security

Semantic Service Oriented Architecture (SSOA); modeling IT environments; federated queries across databases; process management; geospatial information interoperability; predictive analytics; document parsing and entity extraction; mapping biological networks and biomarkers; unified medical ontologies; clinical care support; net-centric data services; knowledge navigation; speed of command; combat information distribution; nested networks; educational and training gateways; grant application processing; tax code navigation; expert systems; integrated defense information access; relationship analytics and social network analysis; pattern recognition; emergency management; immigration; infrastructure protection; international trade


About the Author


In addition to his research and consulting practice, Mills is currently directing development of commu­nity based collaborative semantic magazine that is dedicated to aggregating, linking, and making sense of all things Web 3.0.

Executive Summary


Mills Davis is founder and managing director of Pro­ject10X — a Washington, DC based research con­sultancy specializing in next wave semantic tech­nologies and solutions. The firm’s clients include technology manufacturers, global 2000 corpora­tions, and government agencies.

Mills served as principal investigator for the Seman­tic Wave 2008 research program. A noted consul­tant and industry analyst, he has authored more than 100 reports, whitepapers, articles, and industry studies.

Mills is active in both government and industry-wide technology initiatives that are advancing semantic technologies. He serves as co-chair of the Federal Semantic Interoperability Community of Practice (SICoP). Mills is a founding member of the AIIM in­teroperable enterprise content management (iECM) working group, and a founding member of the Na­tional Center for Ontology Research (NCOR). Also, he serves on the advisory boards of several new ventures in the semantic space.



Sample Pages from Semantic Wave 2008: Industry Roadmap to Web 3.0



In the preceding pages we introduced the thesis and have highlighted some findings and conclu­sions from our new research report — Semantic Wave 2008: Industry Roadmap to Web 3.0. We hope this brief overview will encourage you to read the full report. As you can see from the sample pages, Se­mantic Wave 2008 is no ordinary research report. It is written to be understood by a broad audience and contains a great many figures and illustrations.

Semantic Wave 2008 explains the new semantic technology and gives perspective on emerging pat­terns and keys to success. It gauges both technolo­gy and market readiness. By mapping the frontier, it shows where the tough problems are, and where to look for breakthroughs. But, most importantly, Se­mantic Wave 2008 profiles significant opportuni­ties for executives, developers, designers, entre­preneurs, and investors. What to build and what to buy, and why. For this, SW2008 is simply the most comprehensive resource available anywhere at this crucial time.

The technology section of the report examines five strategic technology themes and shows how inno­vations in these areas are driving development of new categories of products, services, and solution capabilities. Themes include: executable knowl­edge, semantic user experience, semantic social computing, semantic applications, and semantic in­frastructure. The study examines the role of seman­tic technologies in more than 100 application cat­egories. An addendum to the report surveys more than 270 companies that are researching and devel­oping semantic technology products and services.


The market section of the report examines the


growth of supply and demand for products, services


and solutions based on semantic technologies. Spe­cifically, the report segments and discusses seman­tic wave markets from five perspectives: research and development, information and communication technology, consumer internet, enterprise horizon­tal, and industry verticals. Viewed as horizontal and vertical market sectors, each presents multi-billion dollar opportunities in the near-to mid-term. The study presents 150 case studies in 14 horizontal and vertical sectors that illustrate the scope of cur­rent market adoption.

In addition to the main report, there are two adden­da: a supplier directory, and an annotated bibliog­raphy.

Specifications for the Semantic Wave 2008 report and a topic outline follow this page.


Cuadro de texto: Semantic Wave 2008: Industry Roadmap to Web 3.0  Report Specifications 	Report Outline

Cuadro de texto: Format 	PDF — Color and B&W 	1 Introduction  Pages 	400 	2 Semantic Wave  Figures 	290 	2.1 Strategic Vision  Vendors 	270 	2.2 Web 1.0  Applications 	110 	2.3 Web 2.0 	Executive Summary  			26  Market sectors 	14 	2.4 Web 3.0 	 Case examples 	150 	2.5 Web 4.0 	 Price 	$3495 USD 	3 Semantic Technologies 	 Availability 	Nov 1, 2007 	3.1 Technology Themes & Perspectives 	 		3.2 Knowledge 	 		3.3 Semantic User Experience 	 		3.4 Semantic Social Computing 	 		3.5 Semantic Applications 	 		3.6 Semantic Infrastructure 	 		4 Semantic Markets 	 		4.1 Market View 	 		4.2 Research and Development 	 		4.3 Information & Communications Technology 	 		4.4 Consumer Internet 	 		4.5 Enterprise Horizontal 	 		4.6 Industry Verticals 	 		Addenda 	 		A Suppliers 	 		B Bibliography

Yes, our organization wishes to purchase the Semantic Wave 2008 Report: Industry Roadmap to Web 3.0.

Check box to specify request and method of payment:

                    We are ordering on-line by secure credit card transaction. Click here to pay by credit card.

                    We are paying by invoice for $3495 USD.  Click here to pay by invoice.

                    We are paying by check for $3495 USD. Payment is enclosed with this completed order form.

                    We want to discuss access to the report for our entire organization, including presentation on site.


Authorized Contact Information

Cuadro de texto: Name Title 		Executive Summary 27  Organization 		 Address 1 		 Address 2 		 City 		 State/Province 		 Zip/Postal Code 		 Country 		 Business Phone 		 Business Fax 		 Mobile Phone 		 Email 		 URL (www) 		 Authorized Signature: 		 Send completed form to: Mills Davis, Managing Director Project10X 2853 Ontario Road NW #501 Washington, DC 20009 USA For additional information, please contact: Karen Aiken, 408-234-9054  or Bojana Fazarinc, 408-334-4708 	Vox:  202-667-6400 Cell: 202-255-6655 Fax: 1-800-713-8049 Email: mdavis@project10X.com URL: www.project10X.com


Semantic Exchange

The Semantic Wave 2008 Executive Summary is published in association with the Semantic Exchange — a collaborative industry news, research, and education initiative about all things web 3.0 and semantic web. Semantic Exchange is sponsored by the

industry leading organizations presented here.


Semantic Exchange


Aduna offers enterprise search solutions based on guided exploration: during the search process users receive guidance in the form of contextual hints for further exploration and user-friendly vi­sualization to keep overview. RDF(S) metadata storage and retrieval using Sesame, which is an open source RDF database with support for RDF Schema inferencing and querying.


Celtx is the world’s first fully integrated solu­tion for media pre-production and collaboration powered by semantic technologies. This engag­ing, standards based software for the produc­tion of film, video, theatre, animation, radio and new media, replaces old fashioned ‘paper, pen & binder’ media creation with a digital approach to writing and organizing that’s more complete, simpler to work with, and easier to share.

The industry-leading Operational Intelligence Software Suite for proactive management of risk and compliance. Serving public sector, retail and healthcare Industries our operational risk and compliance solutions include theft, fraud & criminal Investigations, improper payments, in­telligence fusion, emergency management and auditory compliance. This platform delivers real-time analytics, case management and dynamic dashboard technologies for detection, investiga­tion, assessment and monitoring.

A leading supplier of semantic software and infrastructure for knowledge management. Se­mantic framework integrates subject and pro­cess ontologies with business rules to deliver intelligent e-forms, knowledge bases over het­erogeneous information sources, advisory ser­vices, auto-classification, decision trees, calcu­lators, and semantic search and navigation.


CHECKMi semantic solutions improve the con­trol, agility and cost of service oriented agent & semantic grid computing. The CHECKMi:Mate is a product platform for networking semantic software agents together to power information analytic services and deliver secure business processing.

empolis, The Information Logistics Company, of­fers enterprise content and knowledge manage­ment solutions for company-wide information logistics and for improving business processes. empolis’ core competencies are information management, service management, product & catalog management and media management. empolis consistently relies on open standards, such as XML, Java or OWL and RDF.

Develops a semantic application server and a suite of SOA based semantic middleware tool­kits which dramatically simplify, as well as accel­erate the building of scalable, fluid applications that incorporate the most advanced semantic techniques and integrate easily with other indus­try leading technologies.

Developer of large-scale “ontology of the uni­verse,” common sense knowledge base, and associated reasoning systems for knowledge-intensive applications. Cyc KB provides a deep layer of understanding that is divided into thousands of “microtheories”, each of which is essentially a bundle of assertions that share a common set of assumptions about a particular domain of knowledge, a particular level of detail, a particular interval in time, etc.

Semantic software, which discovers, classifies and interprets text information. Patent pending technology, Cogito, enables organizations to: extract, discover and understand the connec­tions in your strategic information sets – the thousands of files, e-mails, articles, reports, web pages you have access to everyday; and, un­derstand automatically the meaning of any text written in the language we use to communicate (natural language). Cogito improves business decisions in real time for the majority of corpo­rate functions.


Whether you are a newcomer to semantic technologies or already have experience with them, the goal of Semantic Exchange is to help you better keep up with the rapid pace of technology and infrastructure development, connect with the people and companies making the next stage of the internet happen, and understand the breadth of applica­

tions across consumer and enterprise industry sectors.


iQser provides semantic middleware for the in­tegration and evaluation of data in networks of information. The iQser GIN server does not need to modify or migrate information. It virtually in­tegrates the pieces of information from various data sources. Semantic analytics to link and in­terpret diverse sources of information are fully automatic. Reasoning capabilities over linked data resources are extensive.

Semantic Exchange


KBSI provides advanced R&D, products, and so­lutions in areas such as artificial intelligence and expert systems, geometric reasoning, comput­er-aided design and manufacturing, manufac­turing systems design and analysis, enterprise integration, process modeling, computer-aided software development, systems simulation, business process design and development, and total quality management.


The Metatomix Semantic Platform integrates data, uncovers and defines information relation­ships, and provides meaning and actionable insight to applications. It does so by creating a real-time virtual integration layer that non-inva­sively accesses data from any source (static and dynamic) and allows it to be understood & lever­aged by practically any application.

Kirix Strata is a “data browser” — a fusion of a web browser and a built-in relational database. Strata brings the sensibilities and simplicity of a web browser to the world of tabular data, mak­ing it easy to access, view and use data from any source, even the Web. Strata provides busi­ness analysts, researchers and IT professionals a tool for extremely quick ad hoc analysis and reporting, whether working with local data, data­base systems or back-end business intelligence systems.

Mondeca provides software solutions that lever­age semantics to help organizations obtain max­imum return from their accumulated knowledge, content and software applications. Its solutions are used by publishing, media, industry, tourism, sustainable development and government cus­tomers worldwide.


Ontotext is a semantic technology lab of Sir-ma Group. Ontotext researches and devel­ops core technology for knowledge discovery, management,and engineering, Semantic Web, and web services.

KFI’s Mark 3 Associative Knowledge Platform enables cost effective, long-term retention and application of enterprise knowledge, providing intelligent business process, integrated finan­cial simulations, knowledge-based training, and complex decision support.

Leading European supplier of knowledge gen­eration solutions that use semantic technologies and artificial intelligence to extract knowledge from varied Internet sources and integrate it with business processes. Applications include early warning, risk management, marketing and sales tools, media monitoring, asset management, corporate governance and compliance, issue management, anti money laundering, know­your-client, executive search, project manage­ment, and other business applications.

Ontos creates semantic web solutions for pub­lishers and media providers that provide better search and navigation through related content. Ontos portals create on-the-fly views of infor­mation aggregated from the Internet. Advertis­ing links to related content in a meaningful way. Web widgets enrich website pages with intel­ligent content, resulting in a more compelling experience that attracts readers, increases page views, and enhances search engine optimiza­tion. Increasing traffic on the website leads to more ads and more revenue.


Semantic Exchange

Semantic Exchange educational activities will include a series of monthly webinars, briefings, publications, and media articles. Also, we’re planning a “smart innovators laboratory” where public and private sector organizations can gain access to people, research, and technologies, and can conduct pilot tests to prove out the benefits of

semantic solutions.


Project10X is a premier industry research, edu­cation, and consulting firm specializing in next wave semantic technologies, solutions, markets, and business models. Project10X publishes the Semantic Wave research series including the Semantic Wave 2008 Report. The firm provides educational and training services, and consults with technology manufacturers, global 2000 corporations, government agencies, and tech­nology start-ups. Project10X is directing the Se­mantic Exchange industry education initiative.


RiverGlass meaning-based search and discov­ery helps people and organizations locate and make sense of information relevant to their areas of interest. RiverGlass moves beyond keyword searching and tagging into the meaning of the data to deliver focused, relevant search results that zero in on key pieces of information around people, locations, and events of interest and relationship among them. Connecting the dots sparks insights and improves decision-making.


Semantic Arts is a USA-based consulting firm that helps large firms transform their Enterprise Architectures. Our specialty is reducing com­plexity through the intelligent use of Semantic Technology and Service Oriented Architecture.


Twine is the first application on the Radar Net­works Semantic Web platform. Twine helps users leverage and contribute to the collective intelligence of their friends, colleagues, groups and teams


Develops W3C and OMG standards-compliant, semantically aware, knowledge-based soft­ware products that facilitate business informa­tion interoperability, terminology normalization and context resolution across web-based and enterprise information systems. Visual Ontol­ogy Modeler™ (VOM) UML-based ontology modeling environment supports frame-based knowledge representation and construction of component-based ontologies that capture and represent concepts, resources and processes.


Leading supplier of semantic research solutions. SI services help people access the internet to read just the information they’re interested, use their computer to help reason about it, and then report it just the way they want, easy, fast, and automatically.

Semantic Exchange 30

The Reinvent media group’s mission is to en­lighten and connect people through words and visual information. Its resources include domain names, global advertising networks, virtual cit­ies, semantic technology and a venture capital arm. The group’s vision is to evolve from basic advertising services into a knowledge engine that provides useful content and relevant infor­mation to all people.


Semantech is a professional services firm that provides enterprise-level semantic solutions that unify process, logic, data, and user experience through semantic integration and agile model driven design.

Semantica software provides semantic network theory-based knowledge capture, representa­tion, management, transfer and visualization. Semantica products capture what experts know, organize it, and visually represent it the way that humans store information in long-term memory. In addition to concept mapping, product suite integrates with natural language processing and geospatial integration technologies.

And, Semantic Exchange will be bringing you an open collaborative industry news, research, and education portal about all things web 3.0 and semantic web. The site is part semantic community wiki, part internet magazine, part technology showcase for new capabilities, and part knowledge outfitter where you can gain access to both com­mercial and open source tools, widgets, building blocks, and solution blueprints.


Semantic System ag manufactures hardware technology for intelligent computer systems. Its first generation computer chip “thinks” like a biologic brain making it possible to run complex thought and analyzing processes in hardware to obtain results equivalent to those obtained manually by a skilled humans.


The Talis Platform is an open technology plat­form for mass collaboration and human-centric and information-rich applications. It combines Semantic Web, information retrieval, collective intelligence, and behavioral mining technolo­gies, which can be accessed through a suite of RESTful web services. Talis Platform provides data management, organization and analysis components that can learn and understand pat­terns of behavior and present them through an API to be interwoven into applications.


TopQuadrant provides products, services, knowledge, training programs and methods to help organizations integrate data and process­es and to harness the knowledge distributed across systems and parties. TopQuadrant helps customers implement new capabilities for inte­gration, policy management, search, enterprise architecture and model-driven applications. The TopBraid product suite provides an enterprise-level platform for developing and deploying se­mantic applications.

Semantic Universe’s mission is to raise aware­ness and explain the usage of semantic technol­ogies in business and consumer settings. Proj­ects by Semantic Universe include the annual Semantic Technology Conference (HYPERLINK “http://www.semantic-conference.com/”www. semantic-conference.com) and the Semanti­cReport newsletter (HYPERLINK “http://www. semanticreport.com/”www.semanticreport. com).

Leading supplier of fully automated and real-time contextual targeting services for both ad­vertisements and web pages. Natural Language Processing (NLP) and Machine Learning (ML) technologies automate establishment of se­mantic signatures with contextual attributes that enable high precision customer targeting and media placement.

WAND provides structured multi-lingual vocabu­laries with related tools and services to power precision search and classification applications on the internet, including custom travel, jobs and skills, and medical taxonomies to our corner­stone and product and service taxonomies. In addition to licensing its taxonomies for integra­tion into third party applications, WAND builds precision online horizontal and vertical business directory applications.

Ontology driven technology solutions for infor­mation management projects. Professional ser­vices and semantic middleware for government, media, and financial services industries. Sema­phore semantic processor provides description logic based automatic classification and cate­gorization for taxonomies, thesauri, and ontolo­gies based on rules., intelligent guided search, taxonomy management, dynamic profiling and recommendation software for intranet, internet and portal applications.

OpenCalais automatically creates rich seman­tic metadata for content using natural language processing, machine learning and other meth­ods. You can make graphs with this metadata that give you the ability to improve site naviga­tion, provide contextual syndication, tag and organize your content, create structured folk­sonomies, filter and de-duplicate news feeds, or analyze content to see if it contains what you care about.

Enterprise 2.0 solution: combines Knowledge Plaza platform and new knowledge manage­ment methodology, Enterprise Social Search. to leverage the expertise of colleagues’ to access relevant information and which enhances those individuals who share their knowledge, facilitat­ing access to other’s valuable information and capitalizing it for the benefit of the group.

Semantic Exchange 31

Read Full Post »


See the Complete Report with images at:





Mashware: The Future of  Web Applications

Antero Taivalsaari

> Sun Microsystems Laboratories

Mashware: The Future of Web Applications

Antero Taivalsaari

SMLI TR-2009-181  February 2009


The massive popularity of the World Wide Web is turning the web browser from a document top-style web applications. Web applications require no installation or manual upgrades, and bly powerful, and will dramatically change the way people develop and use software, allowing worldwide application development and instant deployment without middlemen or distributors.

In this paper we present our vision for the future of web applications. A key observation in the paper is that web applications do not have to live by the same constraints that characterized the evolution of conventional desktop applications.The ability to instantly publish software worldwide, and the ability to dynamically combine code and content available from countless web sites and developers all over the planet will open up entirely new possibilities for software development. We believe that this will lead to a new software development approach that can be referred to as mashware, or software as a mashup. In this paper we provide an introduction to mashware, analyze the emerging mashup development technologies, as well as discuss the technical challenges and obstacles that still remain.


Sun Labs 16 Network Circle email address: Menlo Park, CA 94025 antero taivalsaari@sun.com

© 2009 Sun, Sun Microsystems, the Sun logo, Java, Java ME, Java SE, JavaScript, Java Platform, and Sun Labs Lively Kernel are trademarks or registered trademarks of Sun Microsystems, Inc. or its subsidiaries in the United States and other countries.

All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International, Inc. in the U.S. and other countries. Products bearing SPARC trademarks are based upon an architecture developed by Sun Microsystems, Inc. UNIX is a registered trade¬mark in the United States and other countries, exclusively licensed through X/Open Company, Ltd.

Unlimited copying without fee is permitted provided that the copies are not made nor distributed for direct commercial advantage, and credit to the source is given. Otherwise, no part of this work covered by copyright hereon may be reproduced in any form or by any means graphic, electronic, or mechanical, including photocopying, recording, taping, or storage in an information retrieval system, without the prior written permission of the copyright owner.

For information regarding the SML Technical Report Series, contact Jeanie Treichel, Editor-in-Chief <jeanie.treichel@sun.com>. All technical reports are available online on our website, http://research.sun.com/techrep/.

Mashware: The Future of Web Applications

Antero Taivalsaari


Sun Microsystems Laboratories P.O. Box 553 (TUT) FI-33101 Tampere, Finland

1. Paradigm Shift

The software industry is currently experiencing a major disruption. In the past few years, the World Wide Web has become the de facto deployment environment for new software systems and applications. Software applications that were previously targeted to specific operating systems, CPU architectures or devices, are now written for the Web, to be used from web browsers from all over the world. A typical example of this trend is iExpense, Sun’s new web-based expense reporting application that is now used by Sun’s employees worldwide. Outside Sun, there are numerous examples such as Google Docs, Desktoptwo.com, Microsoft Live Mesh, NetSuite.com, Salesforce.com, Webex.com, or various web-based e-mail and instant messaging clients. The recent release of Google’s Chrome web browser (http://www.google.com/chrome) – specifically designed to enable the efficient execution of web applications in the web browser – also confirms this trend.

Web applications have major benefits. In particular, they require no installation or manual upgrades, and they can be deployed and shared instantly worldwide, with no middlemen or distributors. This instant worldwide deployment aspect is extremely powerful and will dramatically change the way people develop, deploy and use software. Ultimately, it will cause a paradigm shift in the software industry – in the same fashion as the Web has already transformed the sharing and distribution of documents, books, photos, music, videos and so many other artifacts. Ordinary computer users will eventually run the majority of software from the Web, instead of using conventional, binary, desktop-bound applications. The long-term implications of this paradigm shift will be at least as significant as the dramatic transformations that are currently taking place in the entertainment and publishing industries.

2. Evolution of the World Wide Web – from Web Pages to Web Applications

The World Wide Web has undergone a number of evolutionary phases. Initially, web pages were simple textual documents with limited user interaction capabilities based on hyperlinks. Soon, graphics support and form-based data entry were added. Gradually, with the introduction of DHTML – the combination of HTML, Cascading Style Sheets (CSS), the JavaScript™ scripting language, and the Document Object Model (DOM) – it became possible to create increasingly interactive web pages with built-in support for advanced graphics and animation. Numerous plug-in components – such as Flash, RealPlayer and Shockwave – were then introduced to make it possible to build web pages with visually rich, interactive multimedia content.

At the high level, the evolution of web pages can be presented in three phases or generations, as illustrated in Figure 1:

1) Simple, “classic” web pages with text and static images only

2) Animated multimedia pages with plug-ins

3) Rich Internet Applications

In the first phase, web pages were truly pages, i.e., page-structured documents that contained primarily text with some interspersed static images, without animation or any interactive content. Navigation between pages was based simply on hyperlinks, and a new web page was loaded from the web server each time the user clicked on a link. There was no need for asynchronous network communication or any more advanced communication protocols between the browser and the web server. Some pages were presented as forms, with simple textual fields and the possibility to use basic widgets such as buttons, radio buttons or pull-down menus. These types of web pages and forms are still very common, characterized by visually simple web sites such as the main page of the Google search engine (http://www.google.com/).


Figure 1: Evolution of web usage

In the second phase, web pages became increasingly interactive, with animated graphics and plug-in components that allowed richer content to be displayed. This phase coincided with the commercial takeoff of the Web, when companies realized that they could create commercially valuable web sites by displaying advertisements or by selling merchandise or services over the Web. Navigation was no longer based solely on hyperlinks, and communication between the browser and the server became increasingly complicated. The JavaScript scripting language, introduced in Netscape Navigator version 2.0B in December 1995, made it possible to build animated, interactive content more easily. The use of plug-in components such as Flash, QuickTime, RealPlayer and Shockwave spread rapidly, allowing

advanced animations, movie clips and audio tracks to be inserted in web pages. In this phase, the Web started moving in directions that were unforeseen by its designers, with web sites behaving more like multimedia presentations rather than conventional pages. Content cross-linking and advertisements became commonplace. Web sites such as Yahoo (http://www.yahoo.com/) or YouTube (http://www.youtube.com/) are good examples of the second phase in web evolution. Car manufacturers such as Cadillac (http://www.cadillac.com/) or Toyota (http://www.toyota.com/) also utilize such technologies to create highly visual, multimedia-rich vehicle presentations on the Web.

In the past few years, we have been in the middle of another major evolutionary step towards desktop-style web applications, also known as Rich Internet Applications (RIAs) or simply as web applications. The technologies intended for the creation of such applications are also often referred to collectively as “Web 2.0” technologies. Web 2.0 is mostly a marketing term, surrounded by a lot of hype, but the underlying trends are dramatically changing the way people will perceive and use the Web and software more generally.

In short, Web 2.0 technologies combine two important characteristics or features: collaboration and interaction. By collaboration, we refer to the “social” aspects that allow a vast number of people to collaborate and share the same services, applications and data over the Web. However, an equally important, but publicly less noted aspect of Web 2.0 technologies is interaction. Web 2.0 technologies make it possible to build web sites that behave much like desktop applications, for example, by allowing web pages to be updated one user interface element at a time, rather than requiring the entire page to be updated each time something changes. Web 2.0 systems often eschew link-based navigation and utilize direct manipulation techniques familiar from desktop-style applications instead. Perhaps the best known example of such a system today is Google Docs (http://docs.google.com) – a web-based office productivity suite that provides support for desktop-style word processing and spreadsheet use over the Web.

Note that the three phases discussed above are not mutually exclusive. Rather, web pages representing all three phases coexist on the Web today. The majority of commercial web pages today represent the second phase. However, the trend towards web applications is becoming increasingly clear, with new web application development technologies and systems being introduced frequently.

3. Beyond Rich Internet Applications: Software as a Mashup

What does the future of web-based software look like beyond Rich Internet Applications? So far, the trend toward RIA has revolved primarily around making the Web a better place for desktop-style applications such as word processors, spreadsheets, e-mail or instant messenger applications.

However, an important realization is that web applications do not have to live by the same constraints that characterized the evolution of conventional desktop software. The ability to instantly publish software worldwide, and the ability to dynamically combine code and content available from numerous web sites and developers all around the planet will open up entirely new possibilities for software development. We believe that this will lead to a new software development approach that can be referred to as mashware, or software as a mashup.

Mashups. The trend toward mashups is already visible in other domains. In web terminology, a mashup is a web site that combines (“mashes up”) content from more than one source (from multiple web sites) into an integrated experience. Mashups are content aggregates that leverage the power of the Web to support worldwide sharing of content that conventionally would not have been easily accessible or reusable in different contexts or from different locations. Typical examples of mashups are web sites that combine photographs or maps taken from one site with other data that is overlaid on top of the map or photo. Some of the archetypical mashups are listed below:

            Chicago Police Department crime statistics mashup (http://chicago.everyblock.com/crime/). This site displays Chicago area crime statistics in various formats based on ZIP codes, dates, type of crime, etc.

            Parking availability mashups (e.g., http://www.parkingcarma.com/). This service displays the availability of parking spaces in various U.S. cities visually, and allows pre-reservation of parking spaces.

            Traffic tracking and congestion mashups (e.g., http://dartmaps.mackers.com/). Traffic tracking and congestion services are available for various cities throughout the world already. The web site mentioned above displays the location of commuter trains in the city of Dublin, Ireland.

            Real estate sales and rental mashups (e.g., http://www.housingmaps.com/). These mashups display houses or apartments for sale or rental, or provide information about recent real estate sales.


Figure 2 contains an illustration of a typical mashup in action. The map in Figure 2 shows an interactive, real-time display of locations of suburban commuter trains in the city of Dublin, Ireland. The mashup runs in a web browser, and is built around Google’s map service.


Figure 2: Dartmaps – a web mashup displaying the locations of commuter trains in Dublin

Mashups are by no means limited to maps or photos with overlays. In principle, the content can be anything as long as it can be meaningfully combined with other information available on the Web, e.g., price comparison information combined with product specifications, latest product news and user reviews or blogs. The key aspect is that the content must be available in a format that can be reused easily in other contexts. Textual representations such as HTML, XML, CSV (Comma-Separated Value format) or JavaScript source code, and standardized image and video formats such as GIF, JPEG, PNG and MPEG-4 play a crucial role in enabling the reuse of content in different contexts.

Software as a Mashup. When applied to software development, mashups give rise to a new way of developing software that can be referred to as mashware: software as a mashup. By mashware, we refer to a form of mashup development in which web applications can be composed by dynamically combining code originating from web sites from all over the world. For instance, the user interface widgets of an application might be downloaded from one site, storage features from another site, the localization capabilities from a third site, and so on, based on the availability of best components for each purpose. The code is combined dynamically in the client, i.e., in the web browser executing the code. The general idea is depicted in Figure 3.


Web Browser

Figure 3: Mashware – software as a mashup

In Figure 3, it is assumed that the developer is building a new web application to visualize stock market information. The application consists of a main application – downloaded from the developer’s own web server – that will dynamically download the other necessary components from other web sites. These components include: (1) the widget library used for presenting the user interface of the application, (2) stock graph visualization library for creating stock graphs, (3) stock quote / market data service interface available from a third site, and (4) localization (L10N) components for customizing the market data and the language for a specific country. All these components are downloaded to the browser from different web servers/sites and used dynamically without any static linking or pre¬processing.

In general, our ideal client-side web application architecture would support highly interactive, visually rich desktop-style applications that utilize the computing power available on the client and in the web browser. The applications would be built out of pre-existing components that can be loaded dynamically from anywhere on the Web on an on-demand basis. The components would be published with well-defined interfaces, i.e., each site would publish a well-defined “contract” to the outside world, and the components would be delivered in a platform-independent format that does not require any static linking or advance binding (e.g., in source code format or in some portable intermediate format). Application execution would take place in the web browser (preferably without any plug-in components), enhanced with a security model that allows application components to be downloaded from anywhere on the Web. Application communication with the web server would take place asynchronously, using asynchronous HTTP(S), without blocking the user interface. No installation or manual upgrades would be required, since the applications and all the necessary components would be loaded dynamically from the Web.

This kind of an environment would make it possible for software developers to collaborate on an immensely large scale, allowing unparalleled sharing and reuse of software, data, layout and visualization information, or any other content across the planet. Applications would be truly web applications, consisting of components that are loaded dynamically from those web sites that provide the most applicable components for each purpose. If such massive-scale reuse were possible, the productivity of software development could potentially be improved dramatically. The Web could be the enabling factor that would finally make large-scale software reuse a reality rather than just a perpetual dream.

4. Mashware vs. Cloud Computing

“Cloudware: Software that runs in or comes from the network.”

— TechEncyclopedia (http://www.techweb.com/encyclopedia/)

Software that runs in the network or is downloaded dynamically from the network is generally referred to as cloudware. The broader term that is used commonly is cloud computing. Cloud computing refers to running applications within a network server or downloading the software from the network each time it is used. Cloud computing implies “thin client” computing, in which the majority of processing is performed on the server. For example, Google Apps (http://www.google.com/apps) and Salesforce.com (http://www.salesforce.com/) provide common business applications online that are accessed from a Web browser, but in which the majority of data is located and processing is performed on the web server.

The mashware concept differs from the cloud computing vision in some important ways. First, the client is much “fatter” than in pure cloud computing. In mashware, the majority of computation – including the mashing up of the dynamically downloaded content – will take place on the client, utilizing the computing power available in the client computer or device. This is in contrast with cloud computing, in which the assumption is that the majority of computation-intensive processing will occur on the server.

Second, in mashware the client platform will offer rich libraries much like those libraries that were available to the developers of conventional desktop applications. In this sense, the mashware concept is

a logical follow-up to the current trend towards Rich Internet Applications. However, the actual mashware applications will be rather different from conventional, desktop-centric Rich Internet Applications, since mashware applications extensively leverage the possibility to dynamically combine code and content from various web sites and developers.

A third important difference between mashware and cloud computing is that mashware applications are not necessarily bound only to a single web server at a time. Rather, the mashware applications running in the web browser may communicate with a number of different web servers simultaneously. For security reasons, today’s web browsers do not usually allow this kind of behavior yet.

The common aspect between mashware and cloud computing (and Rich Internet Applications) is that software is downloaded dynamically from the network by “pulling pieces out of the sky” (hence the reference to the “cloud”). The software is accessed on demand and used with a minimal amount of static linking and advance compilation. In many cases the software is delivered simply in source code form, and executed using a dynamic language such as JavaScript.

Mashware ≈ Rich Cloud Computing

In summary, given the characteristics summarized above, mashware could also be referred to as Rich Cloud Computing – it is a more client-oriented form of cloud computing in which the computing power and other capabilities of client devices are utilized more extensively. Given how much computing power is available in today’s personal computers, mobile phones and other client devices, it seems rational to harness the available power also when building applications that rely extensively on components delivered over the Web.

5. The Importance of Dynamic Languages and JavaScript

“You cannot build mashups out of concrete bricks.”

As summarized by Paulson in the paper “Developers Shift To Dynamic Languages” [Pau07], there is a trend toward dynamic programming languages in the software industry today. The attention that dynamic languages are receiving is remarkable, and is something that has not occurred since the early days of personal computers and the widespread use of the BASIC programming language in the late 1970s and early 1980s. The comeback of dynamic languages is largely due to the shorter lifecycles of applications and generally faster pace of application development. It is also caused by the need to flexibly “glue” together various types of systems and services, as is typical especially in web-based software development.

There are a number of dynamic languages in widespread use today, including Perl, PHP, Python and Ruby. However, the dynamic language that has received by far the most attention recently is JavaScript [Fla06] (statistics on the popularity of different programming languages are available, e.g., in Gartner’s user survey results published in 2006). JavaScript was originally created as a scripting language and included in Netscape Navigator web browser version 2.0B in 1995 to support relatively simple web content scripting and animation tasks. Since then, the use of the JavaScript language has spread to considerably larger tasks and applications. Today, it is not uncommon to have JavaScript applications that consist of tens or even hundreds of thousands of lines of code.

Because of its humble beginnings as a web scripting language, JavaScript has suffered from a reputation as a “toy” language. In reality, JavaScript is a general-purpose programming language that can be used for real programming and not just scripting. We have analyzed the use of the JavaScript language as a real programming language in an earlier paper [MiT07b]. Doug Crockford has written an excellent book about the good and bad characteristics of the JavaScript language [Cro08].

JavaScript and mashware. Some would say that the trend toward JavaScript is purely accidental, reflecting only the fact that JavaScript is the only language supported (in reasonably compatible forms) by all the major web browsers. However, there is a more fundamental transition occurring in the industry that is leading the developers increasingly towards dynamic languages and JavaScript especially.

The power of the World Wide Web stems largely from the absence of static bindings [MiT07a]. For instance, when a web site refers to another site or a resource such as a bitmap image, or when a JavaScript program accesses a certain function or DOM attribute, the references are resolved at runtime without static checking. It is this dynamic nature that makes it possible to flexibly combine content from multiple web sites and, more generally, for the Web to be “alive” and evolve constantly with no central planning or control. The movement to web-based software will inevitably lead developers increasingly toward dynamic programming languages.

Dynamic languages play a critical role in enabling the development of applications that combine content from different web sites and that can adapt to the evolving data formats and types used in different web sites. Dynamic languages serve as the glue that allows different types of data representations to be converted from one form to another easily. For instance, the string manipulation operations (especially those related to regular expressions) in JavaScript are well suited to data format conversions. Dynamic languages usually also provide convenient syntactic mechanisms for creating new objects and new types of objects at runtime. For example, the object literal notation in JavaScript [Fla06 p. 106] makes it very easy to construct new objects on the fly:

 var customer = { name: “John”, age: 37, email: “john@doe.com” };

In a conventional, statically compiled object-oriented language such as C++ or the Java™ programming language, the definition of new objects is much more rigid; basically, all the different types/classes of objects must be defined ahead of program compilation and execution. This makes it more difficult for the software to adapt to changes that later occur in the formats that the application will have to process.

The ability to deliver executable software in source code form also plays a critical role in enabling the development of software in the form of mashups. Unlike with programming languages that depend on static compilation, JavaScript applications can be easily distributed in source code form. The ability to deliver executable software in source code form makes it easier to reuse software in a piecemeal fashion, or tweak the code if some small changes are needed. Conventional programming languages that depend on binary files and static binding are poorly suited for this kind of operation. Compared to dynamic languages, binary files are like “bricks” that cannot easily be reused in contexts unforeseen by the designers.

Granted, dynamic languages and JavaScript have drawbacks too. The extremely dynamic, permissive, error-tolerant nature of the JavaScript language makes it difficult to catch errors at development time. As a general principle, JavaScript virtual machines do not report errors until absolutely necessary. This can lead to problems that are very difficult to trace and debug. For example, spelling errors in variable names implicitly result in the creation of a new variable with the misspelled name. This is often not the desired behavior. While such behavior enables the successful execution of code lines that contain spelling errors (and more generally, allows new variables to be added to objects easily), this usually results in other, significantly more difficult errors later in the execution. When an error is finally reported, the actual problem hides elsewhere in the program and is sometimes very difficult to pinpoint.

Web applications are generally so dynamic that it is impossible to know statically if all the necessary structures will be available at runtime. In some cases the absence of (or changes in) the requested elements can lead to serious problems that are impossible to detect before execution. Consequently, web applications require significantly more testing (especially coverage testing) to make sure that all the possible application behaviors and paths of execution are tested comprehensively.

In general, the development style associated with dynamic languages is rather different from conventional software development. Since there is no way to detect during development time whether all the necessary components are present or have the expected functionality, applications have to be written and tested piece by piece, rather than by writing tens of thousands of lines of code ahead of the first execution. Such a stepwise development style is similar to the style used with programming languages that are specifically geared towards exploratory programming. Examples of such languages include Smalltalk [GoR83] and Self [UnS87].

Performance issues. Since dynamic languages are usually built around simple interpreters that must be able to evaluate source code at runtime, these languages have not been particularly fast. Until recently web application performance was not much of an issue, since JavaScript was used for relatively simple scripting tasks. Furthermore, performance problems were largely masked by other issues such as network latency or poor graphics performance of the web browser. It is only recently when people started writing more serious JavaScript applications that performance issues have become more apparent.

JavaScript applications have traditionally run at least one or two orders of magnitude slower than corresponding applications written in programming languages such as C++ or Java. Yet the perception about JavaScript being a slow language is largely accidental, reflecting only the lack of attention that has been put into optimizing JavaScript virtual machines.

In the past year or so, several high-performance JavaScript engines have become available (in alphabetical order):

            Apple SquirrelFish (http://trac.webkit.org/wiki/SquirrelFish) and SquirrelFish Extreme

            Google V8 (http://code.google.com/p/v8/)

            Mozilla Tamarin (http://www.mozilla.org/projects/tamarin/)

            Mozilla TraceMonkey (https://wiki.mozilla.org/JavaScript:TraceMonkey)


These virtual machines are dramatically faster than the first-generation JavaScript engines that have been used in major web browser for years. For instance, Google’s V8 virtual machine runs the

SunSpider JavaScript benchmark approximately 18 times faster than the JScript engine in Microsoft Internet Explorer version 7. Based on the same benchmark, the performance difference between Google V8 and the SpiderMonkey virtual machine used in Mozilla Firefox version is 8.5 times, i.e., nearly an order of magnitude.

The race towards high-performance JavaScript engines will gradually change the perception about the performance of the JavaScript language and dynamic languages in general. The situation resembles the rapid evolution of Java virtual machines in the late 1990s when the Java virtual machine performance wars were at full height. With abundant, continually increasing processing power available in computers and mobile devices, we believe that JavaScript will eventually become the most widely used programming language not only in desktop computers but also in mobile devices, at least for end-user application development.

6. Towards Mashware: The Landscape of Mashup Development Technologies

The technological pieces needed for supporting mashware are already largely in place, even though the current web browser is not yet ideally suited for such applications. In this section we take a look at some of the key technologies and trends related to mashware.

6.1. Existing General-Purpose Web Application Development Technologies

The landscape of web application development technologies is still rather diverse, reflecting the rapidly evolving state of the art in the web development area. At the high level, the technologies for the development of client-side web applications can be grouped into three categories:

1) Custom runtime: technologies that require a custom runtime (outside the web browser)

2) Plugin-based: technologies that depend on web browser plug-in component(s)

3) Browser-based: technologies that run in the web browser without any add-on components

These categories, as well as various currently used Rich Internet Application (RIA) technologies, have been illustrated in Figure 4. On the right there are technologies – such as the Java and Java FX platforms – that require a custom runtime environment that is commonly used outside the web browser. In the middle, there are systems that run inside a web browser but which require a special plug-in component in order to run inside the browser. On the left there are technologies that run inside a standard web browser without any plug-ins or other add-on components.

Many technologies that belong in the first category, such as Adobe AIR [FPM08], Microsoft Silverlight [Mor08], or Sun’s Java FX [Wea07], also have plug-in based implementations available that allow these systems to be used inside a web browser. For instance, Microsoft Silverlight plug-ins are currently available for Internet Explorer, Mozilla Firefox and Apple Safari web browsers. In general, the technologies in these categories are not distinct but overlap at least to some degree.

Technologies that belong in the third category (shown on the left in Figure 4) – browser-based systems such as Ajax [CPJ05] and Google Web Toolkit [HaT07] that do not require any add-on components – can be divided further into two subcategories: (1) those systems that utilize the computing power of the

client extensively and perform the majority of processing on the client, and (2) those systems that are server-centric and perform the majority of computation on the server (i.e., truly thin clients). Our focus in this paper is primarily on those technologies that leverage the client-side processing power extensively, as we believe that approach to be fundamental to mashware. A good example of such a client-side, purely browser-based web application technology is the Sun™ Labs Lively Kernel [IKU08,




– Run in a standard browser   – Browser plug-in- Custom execution  required    engine required


– No plug-ins needed

– Custom UI     – Runs outside

– Platform-independent



   the browser

– Browser-based UI

– Custom/native UI

Figure 4: Different categories of RIA technologies

Important tradeoffs. It is important to note that there are significant tradeoffs in the RIA technologies based on how closely integrated the technology is with the web browser. Custom runtimes or plugin¬based systems can escape the limitations of the web browser (especially those related to the security model, supported API sets and the user interface features), and customize the user experience and interaction mechanisms much more easily. However, those benefits come at the expense of portability and ease of worldwide application deployment. If one wants to build truly portable “Open Web” applications and deploy those applications instantly worldwide without installation hassles, then any dependencies with technologies unsupported by the web browser must be eliminated. In other words, only those technologies that belong in the third (browser-based) category support the vision of fully portable, “zero-installation” web applications. Because of incompatibilities between different web browsers, there are still limitations in realizing the full potential of that vision, though.

We have provided a summary of several commonly used RIA systems in our previous paper [TMI08]. In that paper, we have also analyzed the use of the web browser as an application platform in detail. Therefore, we will not delve deeper into details here. However, it is interesting to note that there is a frontier forming between two types of RIA systems: (1) those that have been built around the Java programming language, such as Java, Java FX and Google Web Toolkit, and (2) those that have been built around different flavors of JavaScript: Ajax, Adobe Flash and AIR, Microsoft Silverlight, and

Sun Labs Lively Kernel. Note that Google has another RIA technology – Google Gears – that has been designed to complement Ajax, i.e., is built around JavaScript. The use of those RIA technologies that are based on neither Java nor JavaScript (such as Ruby on Rails) seem to be on the wane.

It should also be noted that the remarks about “thin” and “fat” web clients are not as clear-cut as presented in Figure 4. While it is true that custom runtimes and plugin-based systems provide support for much “fatter” API sets and other features, the web browser will gradually absorb more functionality and therefore make the browser-based systems fatter as well. For instance, the upcoming HTML 5 standard (http://www.w3.org/html/wg/html5/) will introduce many new features aimed at web application development, such as an immediate-mode 2D drawing API and drag-and-drop support. Yet the application development APIs offered by the standard web browser are still limited compared to full-fledged platforms such as the Java™ platform or Microsoft Silverlight.

6.2. Existing Mashup Development Technologies and Tools

In addition to general-purpose, “plain vanilla” web application development technologies discussed above, there are tools that have been designed specifically to support the development of mashups. The best known mashup development tools are (in alphabetical order):

            Google Mashup Editor (http://code.google.com/gme/)

            IBM Mashup Center (http://www.ibm.com/software/info/mashup-center/)

            IBM Project Zero (http://www.projectzero.org/)

            Intel Mash Maker (http://mashmaker.intel.com/)

            JackBe Presto (http://www.jackbe.com/)

            LiquidApps (http://www.liquidappsworld.com/)

            Microsoft Popfly (http://www.popfly.com/)

            Open Mashups Studio (http://www.open-mashups.org/)

            Yahoo Pipes (http://pipes.yahoo.com/)


These systems differ from general-purpose web application environments in the sense that they are geared much more towards ordinary users and not just professional software developers. Most of the systems include a visual programming tool that supports drag-and-drop construction of applications based on existing web content and services. Application development is done primarily via “programming by wire”, i.e., by connecting visual elements representing different web services to each other. Source code editing is usually possible, too, but is required only for advanced tasks that cannot be performed visually. The generated applications can be used in standalone mode or embedded in other web pages and services.

Figure 5 shows screen snapshots from two different mashup development tools: Microsoft Popfly and Yahoo Pipes. Both systems are built around a visual editor that allows the user to choose various services from a list or tree of components shown on the left, and connect those services by wiring them to each other visually and then filling in the necessary attributes. The resulting applications can be tested immediately without compilation or static linking.

Both Microsoft Popfly and Yahoo Pipes run inside the web browser, i.e., the user does not have to use a separate tool for development. Most of the work, including the development, testing/debugging and

the execution of applications, occurs within the web browser. For Microsoft Popfly, more conventional development tools – built around Microsoft Visual Studio – are also available.


Figure 5: Examples of mashup development tools: Microsoft Popfly and Yahoo Pipes

In addition to mashup development tools, there are also “web mining” and scraping tools that allow information from web sites to be collected and processed in various ways, and then displayed graphically. These tools are more focused on finding suitable information from the Web rather than combining the information from various sites. A good example of a web mining tool is Metafy Anthracite (http://www.metafy.com/).

6.3. Summary of Popular Mashup Development Tools

In the Fall 2008, Prof. Tommi Mikkonen and Antero Taivalsaari arranged a seminar on Mashup Development Technologies at the Tampere University of Technology, Finland. About 25 Master’s degree and Ph.D. students participated in the seminar. During the seminar, the students took a detailed look at a number of popular mashup development tools, analyzing and dissecting their features, playing with existing demo applications, and building new mashups using the tools. Below we provide a summary of five systems that proved to be most interesting and useful during the seminar. The systems are summarized in alphabetical order.

6.3.1. Google Mashup Editor (GME)

Google Mashup Editor (http://code.google.com/gme/) is an Ajax development framework and a set of associated tools based on widely used web technologies (HTML, CSS, JavaScript) and GME’s own declarative XML tags. The system runs in Mozilla Firefox and Microsoft Internet Explorer web browsers without plug-in components. The system introduces a simple graphics editor that allows the user to create mashups visually. Additionally, GME has a JavaScript API that provides more functionality and flexibility for manipulating data programmatically.

The mashup creation capabilities of the Google Mashup Editor are limited primarily to extracting data from newsfeeds. It is possible to pull data from external RSS and Atom feeds as well as from Google Base (http://www.google.com/base/). Furthermore, it is possible to write application-or user-specific feeds using the JavaScript API.

As with most other mashup creation environments today, all the code created with GME is uploaded and hosted on a server. With the Google Mashup Editor, the source code and uploaded resource files related to applications are stored using the open source project hosting feature of Google Code (http://code.google.com/); when a new GME project is created, a new repository is created on Google Code to store the source files and resources associated with the project. A Subversion interface is provided to edit the files outside the Google Mashup Editor.

Google Mashup Editor is still in beta stage, and its availability is limited to a fixed number of trial users. Based on our experiences during the seminar mentioned above, Google Mashup Editor is still rather rudimentary and limited in terms of functionality, especially when compared to some other mashup environments such as Microsoft Popfly and Yahoo Pipes. On the positive side, it is important to note that Google Mashup Editor does not require any web browser plug-ins, i.e., it is an “Open Web” environment that runs in a standard web browser. (Note: Just as this report was reaching publication, Google announced that the development of Google Mashup Editor will be discontinued.)

6.3.2. IBM Mashup Center

IBM Mashup Center (http://www.ibm.com/software/info/mashup-center/) is a suite of tools built around two key components: IBM Lotus Mashups and IBM InfoSphere MashupHub. IBM Lotus Mashups is a graphical, browser-based tool intended for creating web pages and widgets visually. Lotus Mashups includes (1) a visual editor that runs in a web browser; (2) a mashup catalog which facilitates the sharing and discovery of mashup components, with features such as user community ratings, tagging, and commenting; and (3) a set of out-of-the-box widgets that can be used for visualizing the information displayed in mashups.

The back-end storage capabilities of IBM Mashup Center are provided by IBM InfoSphere MashupHub. InfoSphere MashupHub contains an application server, built around the IBM WebSphere Application Server, that exposes REST APIs (see [Fie00, FiT02]) to clients that can access its services over a secure HTTPS connection. MashupHub includes an integrated database for storing mashup information. It also offers a visual browser-based IDE, written in Ajax and Dojo (http://www.dojotoolkit.org/), that is intended for the visual creation of feeds.

Note that the two components in IBM Mashup Center – Lotus Mashups and InfoSphere MashupHub – offer somewhat overlapping client-side functionality. Both tools offer visual editing capabilities, and it is not always easy to decide which tool to use for which purpose. Lotus Mashups is geared more towards the creation of web pages and towards displaying data using pre-fabricated widgets. In contrast, the visual editing capabilities of InfoSphere MashupHub are intended primarily for creating feeds and feed mashups, resembling the capabilities of Yahoo Pipes to some extent. During our Mashup Development Seminar, students found the overlap between the tools confusing. In general, IBM Mashup Center felt less integrated and less intuitive to use than many other tools evaluated in the seminar.

Perhaps the most unique feature of IBM Mashup Center – at least when compared to the other mashup development systems discussed here – is the feature that allows the user to easily upload and publish data from a local computer (e.g., from a local PC) and then mix and match that data with other information available on the Web. This makes it easy, e.g., to combine information in the users’ Excel spreadsheets with maps or other data services available on the Web.

6.3.3. Intel Mash Maker

Unlike all the other mashup development systems summarized in this paper, Intel Mash Maker (http://mashmaker.intel.com/) is a primarily client-oriented system. Whereas in all the other mashup development systems discussed in this paper the mashing up of data is performed on the server, in Intel Mash Maker content combination is performed on the client (in the web browser). In general, Intel Mash Maker follows a different philosophy, in which “mashing is just browsing” and “mashups are personal”. This is in contrast with the other systems in which the source code, data, and resources of the generated mashups reside on the servers of the mashup service provider. From this viewpoint, Intel Mash Maker is closest to our notion of mashware, as summarized in Sections 3 and 4.

Intel Mash Maker is implemented as a web browser plug-in that complements the existing web browser with additional web page content extraction, processing and visualization capabilities. Plug-ins are currently available for Mozilla Firefox and Microsoft Internet Explorer.

A central component in the Intel Mash Maker is a tool called the extractor (there can be several extractors depending on the type of the content to analyze). An extractor allows the contents of a web page to be parsed and analyzed, and then displayed as a data tree (an example of a data tree is shown in Figure 6 on the left). The data tree provides a component-oriented view of the contents of the web page, based on the underlying (X)HTML or XML content. Currently supported content data formats include XML, JSON, (X)HTML and RSS. The data tree allows the contents of the web page to be edited and processed in various ways, e.g., to create a subset of the page containing only the information that the user wants to see, without advertisements or other undesired information.

A typical Mash Maker usage scenario is as follows. When the user navigates to a web page, Mash Maker immediately analyzes the contents of the site by using an extractor. The system then builds a graphical view of the site, and allows the user to “tag” those items that the user wishes to reuse in other contexts. The user can also choose the items to reuse simply by picking the items directly from the web browser’s page view. The web page looks fully normal, except that yellow highlights will appear around or behind those items that have been selected. The user can then create widgets (implemented as

iframes) that display only the specific content that was chosen earlier, and export those widgets to other places (e.g., to the user’s personal iGoogle web page).


Figure 6: Intel Mash Maker running in the Mozilla Firefox browser

Typical examples of Mash Maker mashups are regular web pages that contain city names. When the user navigates to such a page, the city names are automatically augmented with additional icons (shown next to the city names) that allow the user to fetch current weather information, travel information, and other information about that particular city. When such icons are clicked, the Mash Maker plug-in will automatically obtain the weather and/or traffic information, and display the information using the desired widget. The general idea here is very similar to Mozilla’s GreaseMonkey extension (http://en.wikipedia.org/wiki/Greasemonkey) that allows users to install scripts that make on-the-fly changes to HTML web pages.

Even though Mash Maker is a client-oriented system, it includes “community” capabilities such as sharing the created mashup with other users. For instance, it is possible to share the mashups so that when another user navigates to a web page that has previously been used as a target for mashups by a Mash Maker developer, the web browser will automatically display mashup icons that allow the user to use (and reuse) the mashups created by the other users earlier.

In our Mashup Development Seminar, the students found the Intel Mash Maker system quite useful and well-integrated with existing browsers. The students complained about the lack of proper documentation, though. They also reported that while the system works great in simple examples, the

scripting capabilities are insufficient even for relatively straightforward and frequently needed tasks such as date format conversions or removals of special characters during data extraction. For such tasks, it would be simpler to use an integrated scripting language, rather than attempting to do everything visually.

6.3.4. Microsoft Popfly

Microsoft Popfly (http://www.popfly.com/) is a web-based service that consists of three closely coupled tools: (1) A web page creation tool, (2) a visual game development tool, and (3) a mashup development tool. All these tools run in the web browser, and have a web-based graphical user interface that has been implemented using the capabilities of the underlying Microsoft Silverlight plug-in. A Silverlight plug-in is required to run Popfly. Currently Silverlight plug-ins are available for Microsoft Internet Explorer, Mozilla Firefox, and Apple’s Safari browser.


Figure 7: Microsoft Popfly online game development tool

The Popfly game development tool is illustrated in Figure 7. The game development tool includes a visual tile scripting environment (see [IWC88]) that allows the behavior of objects in simple online games to be defined visually by manipulating graphical blocks that represent the behaviors and

properties of game objects. This part of the Popfly environment is intended more for children rather than for professional software developers. Dozens of existing game templates are provided for starting new online game projects easily.

The Popfly mashup development tool is shown in earlier Figure 5. The mashup tool supports visual, interactive mashup creation inside the web browser. The tool consists of a number of “blocks” (displayed on the left in Figure 5) that represent hook-ups to various existing web services (such as Digg, Flickr, or different map, traffic or weather services). The system can be run in two basic modes: “Edit” and “Run”. In “Edit” mode, the user can connect different web services to each other by dragging the blocks representing them into the main display, drawing “wires” to connect those services, and then editing and filling in the relevant attributes, e.g., to determine which particular properties from the services will be included in the generated mashup. In “Run” mode, the created mashup can be tested inside the web browser immediately without compilation or linking.

All the Popfly tools support instant worldwide publishing of applications via the Popfly.com web service. Applications can be published as widgets and then embedded in other web pages using URLs. There are also wiki-like collaboration capabilities for users to share and discuss their projects. Projects can be “ripped” (copied) or “tweaked” (customized) by other users easily.

Since the Popfly environment has been built on top of the Silverlight Rich Internet Application platform, the environment provides a much more comprehensive set of APIs than is offered by the standard web browser and most other web development environments. A rich set of existing widgets and other GUI elements is provided.

For those users who want to build applications using a more conventional approach – outside the web browser and outside the Popfly web service – Microsoft Visual Studio support is also available. Programming is done primarily in JScript – Microsoft’s flavor of JavaScript. Debugging capabilities are provided both for those users who prefer using Microsoft Visual Studio and for those users who perform all the development using Popfly’s web-based interface.

During our Mashup Development Seminar, the students found the Popfly environment interesting and rather intuitive to use. Out of the systems evaluated in the seminar, Popfly proved to be the one that was most fun to use; students spent hours just viewing and playing with the existing sample applications.

On the negative side, Popfly’s dependence on the Microsoft Silverlight plug-in raised some questions during the seminar. The students also found a lot of bugs and compatibility issues depending on which web browser and which version of Silverlight they were using. Many of the demos crashed, froze, or refused to save themselves on the web server.

Given that all the applications, data and resources created with the web-based interface of Microsoft Popfly reside on Microsoft servers, there was also some discussion and concern about intellectual property ownership and trust more generally. Since most mashup development environments store the generated applications on the server (which is controlled by the mashup service provider and not by the mashup developer herself), this problem is more general and not specific to Microsoft Popfly only.

6.3.5. Yahoo Pipes

Yahoo Pipes (http://pipes.yahoo.com/) is a visual composition tool to aggregate, manipulate, and merge content from around the Web. Yahoo Pipes is able to read information from the Web in various different formats such as HTML, XML, JSON, CSV, RSS, Atom, RDF, Flickr, Google Base, Yahoo Local and Yahoo Search. The system can output results in various formats such as RSS 1.0, RSS 2.0, JSON and Atom. It can also create badges – visual mashup widgets that can be exported and embedded into other web sites (e.g., into iGoogle or My Yahoo).

As shown in Figure 5, Yahoo Pipes is a web-based system that has been built around a visual editor that runs inside a web browser. All the major web browsers are supported (except that for Microsoft Internet Explorer, at least version 7 is required). The visual editor consists of graphical elements that represent different kinds of data processing operations such as regular expressions, filters, sorting or looping instructions. By visually instantiating and then wiring those graphical elements to each other, the programmer can create powerful expressions for fetching and processing data from the Web in various ways. New mashups (known as pipes in Yahoo Pipes parlance) can be tested immediately by using the “Run Pipe…” operation. An integrated debugging panel is included, so that the behavior and the output of the pipe can be analyzed in different ways.

As with most other mashup systems, the generated pipes are stored on the server (in this case on pipes.yahoo.com). The system includes hook-ups to various existing web services such as Flickr, Google Base and Yahoo’s own web services, so that information from those services can be accessed and processed easily.

In our Mashup Development Seminar, the students found Yahoo Pipes the easiest and most intuitive to use, at least out of the five systems summarized in this report. It was also the system with which it was easiest to create useful mashups for everyday use. The visual programming capabilities of Yahoo Pipes were the most comprehensive among the evaluated systems. However, the usage scope of Yahoo Pipes seemed somewhat more limited than the scope of some other systems. Based on the students’ experiences, Yahoo Pipes is particularly capable at processing and creating new feeds. By creating badges, those new feeds can then quite easily be included and shown in other web sites.

6.4. Common Characteristics and Trends in Mashup Development Tools

In analyzing the different mashup development tools available today, some common themes and trends have started to emerge. Such trends include:


         Using the web browser not only to run applications/mashups but also to develop them. For instance, Google Mashup Editor, Microsoft Popfly and Yahoo Pipes use the web browser to host the development environment. Since the applications are developed inside the web browser, the user does not have to use any other tools besides the web browser itself. For many of the systems listed above, conventional IDEs are also available but intended primarily for professional developers.


         Using visual programming techniques to facilitate end-user development. Visual “tile scripting” and “program by wire” environments are provided, e.g., by Microsoft Popfly and Yahoo Pipes.

These environments are intended for non-professional application developers, including children. Conventional source code editing is usually supported, too, but required only for advanced tasks that cannot be performed visually.

         Using the web server to host and share the created mashups. Most of the mashup development tools mentioned above store the created mashups and applications on a web server that is hosted by the service provider. For instance, Google Mashup Editor, Microsoft Popfly and Yahoo Pipes applications reside on the googlemashups.com, popfly.com and pipes.yahoo.com servers, respectively. Applications can be distributed simply by passing along an URL pointing to the application on the server.


         Direct hook-ups to various existing web services. Since the Web itself does not provide enough semantic information or well-defined interfaces to access information in various web sites in a generalized fashion, most of the mashup development tools include custom-built hook-ups to access data in various existing web services. Hook-ups are commonly provided to services such as Digg, Facebook, Flickr, Google Maps, Picasa, Twitter, Yahoo Traffic and various RSS newsfeeds.


It should also be mentioned that most of the above listed mashup development tools are still under development, e.g., in beta or some other pre-release stage, reflecting the rapidly evolving state of the art in mashup development. Nevertheless, many of the systems are already quite advanced and capable, and – perhaps most importantly – a lot of fun even for children to use. For the younger generation of users, those who spend the majority of their time using the web browser anyway, the browser-based application development approach and the possibility to “borrow” code and other content from various sources will seem quite natural.

7. Towards Mashware: Technical Challenges, Obstacles and Solutions

All the mashup development systems discussed above are geared towards mashing up general content on the Web, rather than for creating applications that combine code from multiple web sites and services. In that sense, the systems discussed above do not fulfill our mashware vision yet. Out of the systems discussed in this paper, Intel Mashup Maker is closest to our vision in the sense that it performs content combination on the client, leveraging the computing power of the client computer and the web browser. However, the actual programming capabilities of Intel Mash Maker are among the most limited in those systems that we evaluated during our Mashup Development Seminar.

In the following subsections, we take a look at topics that we believe most fundamentally constrain the evolution of web technologies towards software as a mashup. Some key solutions are also proposed. The material in this section is an abbreviated summary of a workshop paper that was published earlier in 2008 [TaM08].

7.1. Lack of Modularity and Well-Defined Interfaces

When it comes to the development of mashups that combine content from different sites, there are major problems that arise from the fact that web sites have no well-defined interfaces that would clearly describe which parts of the web site are intended for reuse. Even though a tremendous amount

of valuable source code and data is available on the Web, only a fraction of it is available in a modular form that would make the code and data reusable in other contexts. For instance, most web sites do not offer technical documentation (e.g., a public interface specification) that would clearly state which parts of the site and its services are intended to be used externally by third parties, and which parts are implementation-specific and not intended for reuse at all. Only a small number of services, such as Google Maps and Flickr, offer a well-defined API through which these services can be used programmatically by other web sites.

In general, although a number of web interface description languages exist, such as the Web Service Description Language (WSDL, http://www.w3.org/TR/wsdl) or the Web Application Description Language (WADL, https://wadl.dev.java.net/), there is no single commonly accepted standard that would be widely used by web sites today. In the absence of well-defined interfaces and in the absence of a clean separation between the specification and implementation of web sites, there are rarely any guarantees that the reused services would remain consistent or even available in the future. The only exceptions are those services that have been clearly designated (and properly documented) for reuse, such as the aforementioned Google Maps API [GiE06]. Without the mashup development tools mentioned in the previous subsection – and their customized hook-ups to various existing web services

– the reuse of many commonly used web services would be nearly impossible. Even when using such tools, the resulting mashups are still fragile and ad hoc. We have discussed this topic in more detail in our earlier paper [TaM08].

7.2. Absence of Fine-Grained Security Model

Another important problem in the creation of mashware is the absence of a suitable security model. These problems date back to conventions that were established early on in the design and historical evolution of the web browser. A central security-related limitation in the web browser is the Same Origin Policy that was introduced in Netscape Navigator version 2.0 (http://www.mozilla.org/projects/security/components/same-origin.html). The philosophy behind the same origin policy is simple: it is not safe to trust content loaded from arbitrary web sites. When a document containing a script is downloaded from a certain web site, the script is allowed to access resources only from the same web site but not from other sites. In other words, the same origin policy prevents a document or script loaded from one web site (“origin”) from getting or setting properties of a document from a different origin.

The same origin policy makes it difficult to create and deploy mashups or other web applications that combine content (e.g., news, weather data, stock quotes, traffic statistics) from multiple web sites. Basically, most of the content mashing must be performed on the server. Special proxy arrangements are usually needed on the server side to allow networking requests to be passed on to external sites. When deploying web applications, the application developer must therefore be closely affiliated with the owner of the web server in order to make the arrangements for accessing the necessary sites.

7.3. Lack of Proper Namespace Isolation

An additional problem related to the absence of a proper security model is the lack of proper namespace isolation in JavaScript. By default, all the code that is downloaded into the JavaScript

virtual machine shares the same namespace (including the DOM tree). Without the same origin policy that prevents the downloading of content from different web sites, code downloaded from one web site could interface with code originating from other web sites. This would make it possible, e.g., to read private data that should not be visible to external users, or inject malicious scripts into code loaded from other sites. Vulnerabilities of this kind – collectively known as cross-site scripting (XSS) issues – have been exploited to craft powerful phishing attacks and other browser exploits. The possibility of cross-site scripting vulnerabilities is the reason why the same origin policy discussed above was introduced in the first place.

The key observation arising from the problems discussed above is that there is a need for a more fine-grained security model for web applications. On the Web today, applications are second-class citizens that are on the mercy of the classic, “one-size-fits-all” sandbox security model of the web browser. Decisions about security are determined primarily by the site (origin) from which the application is loaded, and not by the specific needs of the application itself. The problems become even more apparent when attempting to develop mashups that would need to flexibly combine content from multiple sites. Even though many interesting proposals have been made [Cro06, YUM07, WFH07, JaW07, KBS08], currently there is no commonly accepted finer-grained security model for web applications or for the Web more generally.

7.4. Additional Problems in Web Application Development

Web application and mashup development is still tedious. In our earlier work, we have divided the problems in web application development into the following categories. These problem areas are not specific to mashup development, but apply to web development more generally:

1)        Usability and user interaction issues

2)        Networking and security issues

3)        Browser interoperability and compatibility issues

4)        Development style and testing issues

5)        Deployment issues

6)        Performance issues

Most of the problems can be traced back to the fact that the web browser was not really designed to be a general-purpose application platform. For a more detailed discussion of problems in these areas, refer to the earlier paper [TMI08].

7.5. Possible Solutions

So, what is still missing from realizing the mashware vision? Architecturally, the following main features would be needed:


Modularity support and proper interfaces with information hiding.

A mechanism to document and publish application interfaces (more generally: the public interfaces of a web site) in a standardized format.

A more fine-grained browser security model that provides controlled access to security-critical APIs and host platform resources, as well as allows applications to download components flexibly from various sites.

An execution engine inside the web browser that supports namespace isolation and modularity to allow content from different sites to run securely.

In principle, technologies for all these areas are already available. For instance, modularity mechanisms and interface description languages have been investigated for decades, starting from the seminar work by Parnas, Liskov, Zilles and others [Par72, LiZ74]. In the context of the Web, technologies and protocols such as REST [Fie00, FiT02], SOAP (http://www.w3.org/TR/soap) and WSDL (http://www.w3.org/TR/wsdl) are gradually making it possible to specify and use the interfaces of web sites in a portable and reusable fashion. Fine-grained security models and namespace isolation have been studied extensively, e.g., in the context of the Java Platform, Standard Edition (Java SE) [GED03] or in the Java Platform, Micro Edition (Java ME). The latter platform has a lightweight, permission-based, certificate-based security model [RTV03] that could potentially be applicable also to web application development.

In general, the challenges in the areas discussed above are not only technological. The key problem is related to retrofitting proper modularity and security mechanisms into an architecture that was not really intended to be a full-fledged software platform in the first place. Standardization is a major challenge, since it is difficult to define a security solution that would be satisfactory to everybody while retaining backwards compatibility with the existing solutions. Also, many security models depend on application signing and/or security certificates; such solutions usually involve complicated business issues, e.g., related to who has the authority to issue security certificates. Therefore, it is likely that any resolutions in this area will still take years. Meanwhile, a large number of security groups and communities – such as the Web Application Security Consortium (WASC), the Open Web Application Security Project (OWASP), and the W3C Web Security Context Working Group – are working on the problem.

8. Conclusion

The World Wide Web is the most powerful medium for information sharing and distribution in the history of humankind. Therefore, it is not surprising that the use of the Web has spread to many new areas outside its original use, including the distribution of photographs, music, videos, and so on. We believe that the massive popularity of the Web will make it the de facto platform for software applications as well. As a consequence, the web browser will take over various roles that conventional operating systems used to have, e.g., in serving as a host environment for most commonly used end-user applications. For the average computer user, the web browser will effectively be the operating system; after all, most of the applications and services that they need will be available on the Web.

In this paper we have argued that the next logical step in the evolution of web applications is mashware: software as a mashup. In web terminology, a mashup is a web site that combines (“mashes up”) content from more than one source (from multiple web sites) into an integrated experience. Mashups are content aggregates that leverage the power of the Web to support worldwide sharing of content that conventionally would not have been easily accessible or reusable in different contexts or

from different locations. By mashware, we refer to a form of client-side mashup development in which web applications can be composed in the web browser by dynamically combining code originating from web sites from all over the world. This kind of an approach will make it possible for software developers to collaborate on an immensely large scale, allowing unparalleled sharing and reuse of software, data, layout and visualization information, or any other content across the planet. In this paper we have discussed the technological background of mashware, analyzed existing technologies intended for mashup development, as well as provided a summary of the challenges and obstacles that still remain in this exciting new area.


The author would like to thank Prof. Tommi Mikkonen and all the students who participated in the Mashup Development Seminar that we arranged at the Tampere University of Technology in Fall 2008. Most of the comments and observations related to existing mashup development tools, especially those presented in Section 6.3, are based on the presentations given by students during the seminar.


CPJ05 Crane, D., Pascarello, E, James, D., Ajax in Action. Manning Publications, 2005.

Cro06 Crockford, D., The <module> Tag: A Proposed Solution to the Mashup Security Problem. http://www.json.org/module.html, October 30, 2006.

Cro08 Crockford, D., JavaScript: The Good Parts. O’Reilly Media, 2008.

Fie00   Fielding, R.T., Architectural Styles and the Design of Network-based Software Architectures. Doctoral dissertation, University of California at Irvine, CA, USA, 2000.

FiT02 Fielding, R.T., Taylor, R.N., Principled Design of the Modern Web Architecture. ACM Transactions on Internet Technology, Vol. 2, No. 2, May 2002, pp. 115-150.

Fla06   Flanagan, D., JavaScript: The Definitive Guide (5th Edition). O’Reilly Media, 2006.

FPM08            Freedman, C., Peters, K., Modien, C., Lucyk, B., Manning, R., Professional Adobe AIR: Application Development for the Adobe Integrated Runtime. Wrox Publishing, 2008.

GiE06 Gibson, R., Erle, S., Google Maps Hacks. O’Reilly Media, 2006.

GoR83            Goldberg, A., Robson, D., Smalltalk-80: the Language and Its Implementation. Addison-Wesley, 1983.

GED03            Gong, L., Ellison, G., Dageforde, M., Inside Java™ 2 Platform Security: Architecture, API Design, and Implementation (2nd Edition). Addison-Wesley (Java Series), 2003.

HaT07             Hanson, R., Tacy, A., GWT in Action: Easy Ajax with the Google Web Toolkit. Manning Publications, 2007.

IKU08             Ingalls, D., Palacz, K., Uhler, S., Taivalsaari, A., Mikkonen, T., The Lively Kernel – A Self-Supporting System on a Web Page. In Proceedings of the Self-Supporting Systems Conference (Potsdam, Germany, May 15-16), Lecture Notes in Computer Science LNCS 5146, Springer-Verlag, 2008, pp. 31-50.

IWC88            Ingalls, D., Wallace, C., Chow, Y-Y., Ludolph, F., Doyle, K., Fabrik: A Visual Programming Environment. In Proceedings of the OOPSLA’88 Conference (San Diego, California, September 25¬30), ACM SIGPLAN Notices, Vol. 23, No. 11, 1988, pp. 176-190.

JaW07             Jackson, C., Wang, H., Subspace: Secure Cross-Domain Communication for Web Mashups. In Proceedings of the 16th International World Wide Web Conference (Banff, Canada, May 8-12), 2007, pp. 611-619.

KBS08            Keukelaere, F., Bhola, S., Steiner, M., Chari, S., Yoshihama, S., SMash: Secure Component Model for Cross-Domain Mashups on Unmodified Browsers. In Proceedings of the 17th International World Wide Web Conference (Beijing, China, April 21-15), 2008, pp. 535-544.

LiZ74 Liskov, B.H., Zilles, S.N., Programming with Abstract Data Types. In Proceedings of ACM SIGPLAN Conference on Very High Level Languages, ACM SIGPLAN Notices, Vol. 9, No. 4, April 1974, pp. 50-59.

MiT07a           Mikkonen, T., Taivalsaari, A., Web Applications: Spaghetti Code for the 21st Century. In Proceedings of the 6th ACIS International Conference on Software Engineering Research, Management and Applications (SERA’2008, Prague, Czech Republic, August 20-22), IEEE Computer Society, 2008, pp. 319-328. (An earlier, longer version published as Sun Labs Technical Report TR-2007-166, Sun Microsystems Laboratories, June 2007.)

MiT07b           Mikkonen, T., Taivalsaari, A., Using JavaScript as a Real Programming Language. Sun Labs Technical Report TR-2007-168, Sun Microsystems Laboratories, October 2007.

Mor08             Moroney, L., Introducing Microsoft Silverlight 2.0. Microsoft Press, 2008.

Pau07 Paulson, L.D., Developers Shift to Dynamic Programming Languages, IEEE Computer, Vol. 40, No. 2, February 2007, pp. 12-15.

Par72 Parnas, D.L., On the Criteria to be Used in Decomposing Systems into Modules. Communications of the ACM, Vol. 15, No. 12, Dec. 1972, pp. 1053-1058.

RTV03            Riggs, R., Taivalsaari, A., Van Peursem, J., Huopaniemi, J., Patel, M., Uotila, A., Programming Wireless Devices with the Java™ 2 Platform, Micro Edition (2nd Edition). Addison-Wesley (Java Series), 2003.

TMI08             Taivalsaari, A., Mikkonen, T., Ingalls, D., Palacz, K., Web Browser as an Application Platform. In

Proceedings of the 34th Euromicro Conference on Software Engineering and Advanced Applications

(SEAA’2008, Parma, Italy, September 3-5), IEEE Computer Society, 2008, pp. 293-302. (An earlier, longer version published as Sun Labs Technical Report TR-2008-175, January 2008.)

TaM08            Taivalsaari, A., Mikkonen, T., Mashups and Modularity: Towards Secure and Reusable Web Applications. In Proceedings of 1st Workshop on Social Software Engineering and Applications (SoSEA’2008, L’Aquila, Italy, September 16), 2008.

UnS87             Ungar, D., Smith, R.B., Self: The Power of Simplicity. In Proceedings of the OOPSLA’87 Conference (Orlando, Florida, October 4-8), 1987, pp. 227-241.

Wea07            Weaver, J.L., JavaFX Script: Dynamic Java Scripting for Rich Internet/Client-Side Applications. Apress, 2007.

WFH07           Wang, H. J., Fan, X., Howell, J., Jackson, C., Protection and Communication Abstractions for Web Browsers in MashupOS. In Proceedings of the 21st ACM Symposium on Operating Systems Principles (Stevenson, WA, USA, October 14-17), Operating Systems Review, Vol. 41, No. 6, 2007, pp. 1-16.

YUM07           Yoshihama, S., Uramoto, N., Makino, S., Ishida, A., Kawanaka, S., De Keukelaere, F., Security Model for the Client-Side Web Application Environments. IBM Tokyo Research Laboratory presentation, May 24, 2007.

About the Author

Dr. Antero Taivalsaari is a Principal Investigator and Senior Staff Engineer at Sun Labs. Antero is best known for his seminal role in the design of the Java™ Platform, Micro Edition (Java ME platform) – one of the most popular commercial software platforms in the world, with over two billion devices deployed so far. Antero has received Sun’s Chairman’s Award for Innovation twice (in 2000 and 2003) for his work on Java ME technology. Since August 2006, Antero has been co-leading the Lively Kernel project with Dan Ingalls to bring lively, desktop-style user experience to the world of web programming. For more information, refer to http://www.taivalsaari.com/.



Sun Microsystems Laboratories 16 Network Circle Menlo Park, CA 94025

Mashware: The Future of  Web Applications Antero Taivalsaari  SMLI TR-2009-181

Read Full Post »

There is No Web 3.0, There is No Web 2.0 – There is Just the Web

Written by Josh Catone / April 24, 2008 4:57 PM / 36 Comments

Something struck me while listening to Tim O’Reilly’s keynote speech at the Web 2.0 expo yesterday: glancing at my notes after he walked off stage, I noticed that his current definition for Web 2.0, is a lot like the definition he’s given for Web 3.0. Based on this, plus past comments from O’Reilly that I dug up via a few web searches, I am forced to one conclusion: Tim O’Reilly, the man credited with popularizing the term Web 2.0, doesn’t actually believe it exists. For O’Reilly, there is just the web right now. 1.0, 2.0, 3.0 — it’s all the same ever-changing web.

Let’s first take a look at Tim O’Reilly’s widely used and accepted compact definition for Web 2.0 circa 2006 (way, way back in the dark ages of a year and a half ago):

Web 2.0 is the business revolution in the computer industry caused by the move to the internet as platform, and an attempt to understand the rules for success on that new platform. Chief among those rules is this: Build applications that harness network effects to get better the more people use them. (This is what I’ve elsewhere called “harnessing collective intelligence.”)

We can perhaps simplify that even further: Web 2.0 is the web as a platform and collective intelligence (or, leveraging of user created data). Now let’s look at Tim’s definition of Web 3.0 (which actually predates his last Web 2.0 definition):

Recently, whenever people ask me “What’s Web 3.0?” I’ve been saying that it’s when we apply all the principles we’re learning about aggregating human-generated data and turning it into collective intelligence, and apply that to sensor-generated (machine-generated) data.

Which we can simplify to mean, the leveraging of the things we created in Web 2.0. And here’s the Web 2.0 defintion he had up on a slide yesterday during his keynote:


  • The Internet is the platform
  • Harnessing the collective intelligence
  • Data as the “Intel Inside”
  • Software above the level of a single device
  • Software as a service


O’Reilly talked about Web 2.0 in terms of taking user-generated data and turning it into user facing services. So now we’re starting to see a lot of overlap between the two definitions. He’s also brought in a lot of Web 3.0 definitions that other people have given and used them as part of this broader definition of Web 2.0. For example, Eric Schmidt of Google talked about Web 3.0 in terms of sofware as a service and cloud computing. Our own Alex Iskold talked about Web 3.0 in terms of web sites being turned into platforms. And so on.

“For ‘Web 3.0’ to be meaningful we’ll need to see a serious discontinuity from the previous generation of technology … I find myself particularly irritated by definitions of ‘Web 3.0’ that are basically descriptions of Web 2.0,” Tim O’Reilly once said, which is mildly ironic given that his current Web 2.0 definition basically eclipses his old Web 3.0 definition. But in reality, I think O’Reilly is saying that the versioning doesn’t really matter — the web is the web.

“The points of contrast [between Web 2.0 and Web 3.0] are actually the same points that I used to distinguish Web 2.0 from Web 1.5. (I’ve always said that Web 2.0 = Web 1.0, with the dot com bust being a side trip that got it wrong.),” wrote O’Reilly last fall. In otherw words, the versioning of the web is silly. Web 1.0, 2.0, or 3.0 is all really just whatever cool new thing we’re using the web to accomplish right now.

And he has a point. A couple of days ago, we wrote about the history of the term Web 3.0 and noted that the term itself doesn’t really matter, what matters is the discussions we have when trying to define it. “It is the discussion that is helpful rather than coming to any accepted definition. Some might argue that version numbers are silly on the web, that Web 2.0 and Web 3.0 are just marketing ploys, and that we shouldn’t use terms that are so nebulous and difficult to define. Those are all fair points. But at the same time, the discussions we have about defining the next web help to solidify our vision of where we’re going — and you can’t get there until you decide where you want to go,” we wrote.

Web 2.0 and Web 3.0 — they don’t really exist. They’re just arbitrary numbers assigned to something that doesn’t really have versions. But the discussion that those terms have prompted have been helpful, I think, in figuring out where the web is going and how we’re going to get there; and that’s what is important.

So next time someone asks me what we cover on ReadWriteWeb, maybe I won’t use the term “Web 2.0” in my reply, I’ll just tell them that we write about the web, what you can do with it now, and what you’ll be able to do with it in the future.

Read Full Post »

The Future of the Desktop

Written by Guest Author / August 18, 2008 3:22 PM / 35 Comments

Everything is moving to the cloud. As we enter the third decade of the Web we are seeing an increasing shift from native desktop applications towards Web-hosted clones that run in browsers. For example, a range of products such as Microsoft Office Live, Google Docs, Zoho, ThinkFree, DabbleDB, Basecamp, and many others now provide Web-based alternatives to the full range of familiar desktop office productivity apps. The same is true for an increasing range of enterprise applications, led by companies such as Salesforce.com, and this process seems to be accelerating. In addition, hosted remote storage for individuals and enterprises of all sizes is now widely available and inexpensive. As these trends continue, what will happen to the desktop and where will it live?

This is a guest post by Nova Spivack, founder and CEO of Twine. This is the final version of an article Spivack has been working on in his public Twine.

Is the desktop of the future going to just be a web-hosted version of the same old-fashioned desktop metaphors we have today?

No. There have already been several attempts at copying the old-fashioned “files and folders” desktop interface to the Web, but they have not caught on. Imitations desktops to-date have simply been clunky and slow imitations of the real-thing at best. Others have been overly slick. But one thing they all have in common: None of them have nailed it.  People don’t want to manage all their information on the Web in the same interface they use to manage data and apps on their local PC. The Web is an entirely different medium than the desktop and it requires a new kind of interface. The desktop of the future – what some have called “the Webtop” – still has yet to be invented.

The desktop of the future is going to be a hosted web service

Is the desktop even going to exist anymore as the Web becomes increasingly important? Yes, there has to be some kind of place that we consider to be our personal “home” and “workspace” — but it’s not going to live on any one device.

As we move into a world that is increasingly mobile, where users often work across several different devices in the course of their day, we need unified access to our applications and data. This requires that our applications and data do not reside on local devices anymore, but rather that they will live in the cloud and be accessible via Web services.

The painful process of using synchronization utilities to keep data on our different devices in-synch will finally be a thing of the past. Similarly an entire class of applications for remote-PC access will also become extinct. Instead, all devices will synch with the cloud, where your applications, data and desktop workspace state will live as a unified, hosted service. Your desktop will appear on whatever device you login to, just as you left it wherever you last accessed it. This shift harkens back to previous attempts to revive thin-client computing –  such as Sun Microsystems’ Java Desktop – but this time it is going to actually become mainstream.

The Browser is Going to Swallow Up the Desktop

It’s a classic embrace-and-extend story – the Web browser began as just another app on the desktop and has quickly embraced and extended every other application to become the central tool on everyone’s desktop. All that remains is the desktop itself – and the browser is quickly making inroads there as well. In particular Firefox, with it’s easy extensibility and huge range of add-ons, is rapidly displacing the remaining features of the desktop.

If these trends continue, will the browser eventually swallow up or simply replace the desktop? Yes. In fact, it will probably happen very soon. There just isn’t any reason to have a desktop outside the browser anymore. What we think of as “the desktop” is really just a perspective on our information and applications – it’s really just another “page” or context in our digital lives. This could easily exist within a browser. So instead of launching the browser from the desktop, it makes more sense to launch the desktop from the browser. In this way of thinking, the desktop is really just our home page – the place where we do our work and keep up with our world.

The focus of the desktop will shift from information to attention

As our digital lives evolve out of the old-fashioned desktop into the browser-centric Web environment we will see a shift from organizing information spatially (directories, folders, desktops, etc.) to organizing information temporally (feeds, lifestreams, microblogs, timelines, etc.). The Web is constantly changing and the biggest challenge is not finding information, it is keeping up with it.

The desktop of the future is going to be more concerned with helping users manage information overload – particularly the overload caused by change. In this respect, it is going to feel more like an RSS feed reader or a social news site than a directory. The focus will be on helping the user to manage and keep up with all the stuff flowing in and out of the their environment. The interface will be tuned to help the user understand what the trends are, rather than just on how things are organized.

Users are going to shift from acting as librarians to acting as daytraders.

As we move into an era where content creation and distribution become almost infinitely cheap, the scarcest resources will no longer be storage or bandwidth, it will be attention. The pace of information creation and distribution continues to accelerate and there is no end in sight, yet the cognitive capabilities of the individual human brain are finite and we are already at our limits.

In order to cope with the overwhelming complexity of our digital lives, we are going to increasingly rely on tools that help us manage our attention more productively — rather than tools that simply help us manage our information.

It is a shift from the mindset of being librarians to that of being daytraders. In the PC era we were all focused on trying to manage the information on our computers — we were acting as librarians. Filing things was a big hassle, and finding them was just as difficult. But today filing information is really not the problem: Google has made search so powerful and ubiquitous that many Web users don’t bother to file anything anymore – instead they just search again when they need it. The librarian problem has been overcome by the brute force of Web-scale search. At least for now.

Instead we are now struggling to cope with a different problem – the problem of filtering for what is really important or relevant now and in the near-future. With limited time and attention, we have to be careful what we look for and what we pay attention to. This is the mindset of the daytrader. Bet wrong and you could end up wasting your precious resources, bet right and you could find the motherlode before the rest of the world and gain valuable advantages by being first. Daytraders are focused on discovering and keeping track of trends. It’s a very different focus and activity from being a librarian, and it’s what we are all moving towards.

The Webtop will be more social and will leverage and integrate collective intelligence

The Webtop is going to be more socially oriented than desktops of today — it will have built-in messaging and social networking, as well as social-media sharing, collaborative filtering, discussions, and other community features.

The social dimension of our lives is becoming perhaps our most important source of information. We get information via email from friends, family and colleagues. We get information via social networks and social media sharing services. We co-create information with others in communities. And we team up with our communities to filter, rate and redistribute content.

The social dimension is also starting to play a more important role in our information management and discovery activities. Instead of those activities remaining as solitary, they are becoming more communal. For example many social bookmarking and social news sites use community sentiment and collaborative filtering to help to highlight what is most interesting, useful or important. 

Sites such as Digg, Reddit, Mixx, Slashdot, Delicious, StumbleUpon, Twine, and many others, show that collective intelligence may be the most powerful way to help individuals and groups filter content and manage their attention more productively. The power of many trumps the power of one.

The desktop of the future is going to have powerful semantic search and social search capabilities built-in

Our evolving Webtop is going to have more powerful search built-in. It will of course provide best-of-breed keyword search capabilities, but this is just the beginning.

It will also combine social search and semantic search. On the social search dimension, users will be able to search their information and rank it via attributes of their social graph (for example, “find documents about x and rank them by how many of my friends liked them.”)

Semantic search on the other hand will enable more granular search and navigation of information along a potentially open-ended networks of properties and relationships. For example you will be able to search in a highly structured way — for example, search for products you once bookmarked that have a price of $10.95 and are on-sale this week. Or search for documents you read which were authored by Sue and related to project X, in the last month. The semantics of the future desktop will be open-ended. That is to say that users as well as other application and information providers will be able to extend it with custom schemas, new data types, and custom fields to any piece of information.

Interactive shared spaces will replace folders

Forget about shared folders — that is an outmoded paradigm. Instead, the new metaphor will be interactive shared spaces. These shared spaces will be more like wikis than folders. They will be permission-based environments where one or many contributors can meet, interact synchronously or asynchronously, to work on information and other tasks together.

There are many kinds of shared spaces already in existence, including discussion forums, blogs, social network profiles, community sites, file sharing tools, conferencing tools, version control systems, and groupware. But as we move into Web 3.0 these will begin to converge. We will store information in them, we will work on information there, we will publish and distribute information through them, we will search across them, and we will interact with others around them.

Our next-generation shared spaces will be nestable and linkable like folders, but they will be far more powerful and dynamic, and they will be accessible via HTTP and other APIs such as SPARQL enabling data to be moved in and out of them easily by other applications around the Web.

Any group of two or more individuals will be able to participate in a shared space that will appear on their individual desktops, for a particular purpose. These new shared spaces will not only provide richer semantics in the underlying data, social network, and search, but they will also enable groups to seamlessly and collectively add, organize, track, manage, discuss, distribute, and search for information of mutual interest.

The Portable Desktop

The underlying data in the future desktop, and in all associated services it connects, will be represented using open-standard data formats. Not only will the data be open, but the semantics of the data – the schema that defines it – will also be defined in an open way. The value of open linked-data and open semantics is that data will not be held prisoner anywhere: it will be portable and will be easy to integrate with other data. The emerging Semantic Web and Data Portability initiatives provide a good set of open standards for enabling this to happen.

Due to open-standards and data-portability, your desktop and data will be free from “platform lock-in.” This means that your Webtop might even be portable to a different competing Webtop provider someday. If and when that becomes possible, how will Webtop providers compete to add value?

The Smart Desktop

One of the most important aspects of the coming desktop is that it’s going to be smart. It’s going to have to be. Users simply cannot handle the complexity of their information landscapes anymore – they need help. There are a range of tasks that the desktop should automate for users including: organizing information, reminding users when necessary, resolving data conflicts, managing versioning, maintaining data quality, backing up data, prioritizing information, and gathering relevant information and suggesting it when appropriate.

Most other features of the future desktop will be commodities – but intelligence will still be difficult to provide, and so it will be the last remaining frontier in which competing Webtop providers will be able to differentiate their offerings.

The Webtop is going to learn and help you to be more productive. As you use it, it’s going to adjust to your interests, relationships, current activities, information and preferences. It will adaptively self-organize to help you focus your attention on what is most important to whatever context you are in.

When reading something while you are taking a trip to Milan it may organize itself to be more contextually relevant to that time, place and context. When you later return home to San Francisco it will automatically adapt and shift to your home context. When you do a lot of searches about a certain product it will realize your context and intent has to do with that product and will adapt to help you with that activity for a while, until your behavior changes.

Your desktop will actually be a semantic knowledge base on the back-end. It will encode a rich semantic graph of your information, relationships, interests, behavior and preferences. You will be able to permit other applications to access part or all of your graph to datamine it and provide you with value-added views and even automated intelligent assistance.

For example, you might allow an agent that cross-links things to see all your data: it would go and add cross links to relevant things onto all the things you have created or collected. Another agent that makes personalized buying recommendations might only get to see your shopping history across all shopping sites you use.

Your desktop may also function as a simple personal assistant at times. You will be able to converse with your desktop eventually — through a conversational agent interface. While on the road you will be able to email or SMS in questions to it and get back immediate intelligent answers. You will even be able to do this via a voice interface.

For example, you might ask, “where is my next meeting?” or “what Japanese restaurants do I like in LA?” or “What is Sue’s Smith’s phone number?” and you would get back answers. You could also command it to do things for you — like reminding you to do something, or helping you keep track of an interest, or monitoring for something and alerting you when it happens.

Because your future desktop will connect all the relationships in your digital life — relationships connecting people, information, behavior, preferences and applications — it will be the ultimate place to learn about your interests and preferences.

Federated, open policies and permissions

This rich graph of meta-data that comprises your future desktop will enable the next-generation of smart services to learn about you and help you in an incredibly personalized manner. It will also of course be rife with potential for abuse and privacy will be a major function and concern.

One of the biggest enabling technologies that will be necessary is a federated model for sharing meta-data about policies and permissions on data. Information that is considered to be personal and private in Web site X should be recognized and treated as such by other applications and websites you choose to share that information with. This will require a way for sharing meta-data about your policies and permissions between different accounts and applications you use.

The semantic web provides a good infrastructure for building and deploying a decentralized framework for policy and privacy integration, but it has yet to be developed, let alone adopted. For the full vision of the future desktop to emerge a universally accepted standard for exchanging policy and permission data will be a necessary enabling technology.

The personal cloud

One way to think of the emerging Webtop is as your personal cloud. It will not just be a cloud of data, it will be a compute cloud as well. When you need to store or retrieve information it will provide that service. When you need to do computations, it will provide that to you as well. The cost of harnessing the capabilities of your cloud may be based on a monthly subscription or it may be metered, or it may be ad-supported.

Your personal cloud will have a center – provided by your main Webtop provider, where your address will live — but most of its services will be distributed in other places, and even federated among other providers. Yet from an end-user perspective it will function as a seamlessly integrated service. You will be able to see and navigate all your information and applications, as if they were in one connected space, regardless of where they are actually hosted. You will be able to search your personal cloud from any point within it. It will look and feel like a single cohesive service.

The WebOS

No discussion of the future of the desktop would be complete without delving into the topic of the WebOS. The shift from desktop to Webtop – the move from a local desktop to a hosted desktop – is a necessary step towards the entire operating system moving to the Web as well. Many of the services that comprise an operating system are already available as Web services, but they are not yet integrated into a single cohesive WebOS. However it seems clear that the major players are aware of this opportunity and are positioning their services to capture it. Just as the desktop OS wars were won by capturing the “high ground” of the desktop, I would not be surprised if the same principle holds in the battle to own the WebOS. Whomever wins the Webtop will win the whole stack.

Who is most likely to own the future desktop?

When I think about what the future desktop is going to look like it seems to be a convergence of several different kinds of services that we currently view as separate.

It will be hosted on the cloud and accessible across all devices. It will place more emphasis on social interaction, social filtering, and collective intelligence. It will provide a very powerful and extensible data model with support for both unstructured and arbitrarily structured information. It will enable almost peer-to-peer like search federation, yet still have a unified home page and user-experience. It will be smart and personalized. It will be highly decentralized yet will manage identity, policies and permissions in an integrated cohesive and transparent manner across services.

By cobbling together a number of different services that exist today you could build something like this in a decentralized fashion. As various services integrate with each other it may simply emerge on its own. But is that how the desktop of the future will come about? Or will it be provided as a new application from one player – perhaps one with a lot of centralized market power and the ability to launch something like this on a massive scale? Or – just as with the previous desktop hits of the past, will it come from a little-known upstart with a disruptive technology? It’s hard to predict, but one thing is certain: it is going to happen relatively soon and will be an interesting process to watch.

Image via Arnaldo Licea

Read Full Post »

Top-Down: A New Approach to the Semantic Web

Written by Alex Iskold / September 20, 2007 4:22 PM / 17 Comments

Earlier this week we wrote about the classic approach to the semantic web and the difficulties with that approach. While the original vision of the layer on top of the current web, which annotates information in a way that is “understandable” by computers, is compelling; there are technical, scientific and business issues that have been difficult to address.

One of the technical difficulties that we outlined was the bottom-up nature of the classic semantic web approach. Specifically, each web site needs to annotate information in RDF, OWL, etc. in order for computers to be able to “understand” it.

As things stand today, there is little reason for web site owners to do that. The tools that would leverage the annotated information do not exist and there has not been any clearly articulated business and consumer value. Which means that there is no incentive for the sites to invest money into being compatible with the semantic web of the future.

But there are alternative approaches. We will argue that a more pragmatic, top-down approach to the semantic web not only makes sense, but is already well on the way toward becoming a reality. Many companies have been leveraging existing, unstructured information to build vertical, semantic services. Unlike the original vision, which is rather academic, these emergent solutions are driven by business and market potential.

In this post, we will look at the solution that we call the top-down approach to the semantic web, because instead of requiring developers to change or augment the web, this approach leverages and builds on top of current web as-is.

Why Do We Need The Semantic Web?

The complexity of original vision of the semantic web and lack of clear consumer benefits makes the whole project unrealistic. The simple question: Why do we need computers to understand semantics? remains largely unanswered.

While some of us think that building AI is cool, the majority of people think that AI is a little bit silly, or perhaps even unsettling. And they are right. AI for the sake of AI does not make any sense. If we are talking about building intelligent machines, and if we need to spend money and energy annotating all the information in the world for them, then there needs to be a very clear benefit.

Stated the way it is, the semantic web becomes a vision in search of a reason. What if the problem was restated from the consumer point of view? Here is what we are really looking forward to with the semantic web:


  • Spend less time searching
  • Spend less time looking at things that do not matter
  • Spend less time explaining what we want to computers


A consumer focus and clear benefit for businesses needs to be there in order for the semantic web vision to be embraced by the marketplace.

What If The Problem Is Not That Hard?

If all we are trying to do is to help people improve their online experiences, perhaps the full “understanding” of semantics by computers is not even necessary. The best online search tool today is Google, which is an algorithm based, essentially, on statistical frequency analysis and not semantics. Solutions that attempt to improve Google by focusing on generalized semantics have so far not been finding it easy to do so.

The truth is that the understanding of natural language by computers is a really hard problem. We have the language ingrained in our genes. We learn language as we grow up. We learn things iteratively. We have the chance to clarify things when we do not understand them. None of this is easily replicated with computers.

But what if it is not even necessary to build the first generation of semantic tools? What if instead of trying to teach computers natural language, we hard-wired into computers the concepts of everyday things like books, music, movies, restaurants, stocks and even people. Would that help us be more productive and find things faster?

Simple Semantics: Nouns And Verbs

When we think about a book we think about handful of things – title and author, maybe genre and the year it was published. Typically, though, we could care less about the publisher, edition and number of pages. Similarly, recipes provoke thoughts about cuisine and ingredients, while movies make us think about the plot, director, and stars.

When we think of people, we also think about a handful of things: birthday, where do they live, how we’re related to them, etc. The profiles found on popular social networks are great examples of simple semantics based around people:

Books, people, recipes, movies are all examples of nouns. The things that we do on the web around these nouns, such as looking up similar books, finding more people who work for the same company, getting more recipes from the same chef and looking up pictures of movie stars, are similar to verbs in everyday language. These are contextual actuals that are based on the understanding of the noun.

What if semantic applications hard-wired understanding and recognition of the nouns and then also hard-wired the verbs that make sense? We are actually well on our way doing just that. Vertical search engines like Spock, Retrevo, ZoomInfo, the page annotating technology from Clear Forrest, Dapper, and the Map+ extension for Firefox are just a few examples of top-down semantic web services.

The Top-Down Semantic Web Service

The essence of a top-down semantic web service is simple – leverage existing web information, apply specific, vertical semantic knowledge and then redeliver the results via a consumer-centric application. Consider the vertical search engine Spock, which scans the web for information about people. It knows how to recognize names in HTML pages and it also looks for common information about people that all people have – birthdays, locations, marital status, etc. In addition, Spock “understands” that people relate to each other. If you look up Bush, then Clinton will show up as a predecessor. If you look up Steve Jobs, then Bill Gates will come up as a rival.

In other words, Spock takes simple, everyday semantics about people and applies it to the information that already exists online. The result? A unique and useful vertical search engine for people. Further, note that Spock does not require the information to be re-annotated in RDF and OWL. Instead, the company builds adapters that use heuristics to get the data. The engine does not actually have full understanding of semantics about people, however. For example, it does not know that people like different kinds of ice cream, but it doesn’t need to. The point is that by focusing on a simple semantics, Spock is able to deliver a useful end-user service.

Another, much simpler, example is the Map+ add-on for Firefox. This application recognizes addresses and provides a map popup using Yahoo! Maps. It is the simplicity of this application that precisely conveys the power of simple semantics. The add-on “knows” what addresses look like. Sure, sometimes it makes mistakes, but most of the time it tags addresses in online documents properly. So it leverages existing information and then provides direct end user utility by meshing it up with Yahoo! Maps.

The Challenges Facing The Top-Down Approach

Despite being effective, the somewhat simplistic top-down approach has several problems. First, it is not really the semantic web as it is defined, instead its a group of semantic web services and applications that create utility by leveraging simple semantics. So the proponents of the classic approach would protest and they would be right. Another issue is that these services do not always get semantics right because of ambiguities. Because the recognition is algorithmic and not based on an underlying RDF representation, it is not perfect.

It seems to me that it is better to have simpler solutions that work 90% of the time than complex ones that never arrive. The key questions here are: How exactly are mistakes handled? And, is there a way for the user to correct the problem? The answers will be left up to the individual application. In life we are used to other people being unpredictable, but with computers, at least in theory, we expect things to work the same every time.

Yet another issue is that these simple solutions may not scale well. If the underlying unstructured data changes can the algorithms be changed quickly enough? This is always an issue with things that sit on top of other things without an API. Of course, if more web sites had APIs, as we have previously suggested, the top-down semantic web would be much easier and more certain.


While the original vision of the semantic web is grandiose and inspiring in practice it has been difficult to achieve because of the engineering, scientific and business challenges. The lack of specific and simple consumer focus makes it mostly an academic exercise. In the mean time, existing data is being leveraged by applying simple heuristics and making assumptions about particular verticals. What we have dubbed top-down semantic web applications have been appearing online and improving end user experiences by leveraging semantics to deliver real, tangible services.

Will the bottom-up semantic web ever happen? Possibly. But, at the moment the precise path to get there is not quite clear. In the mean time, we can all enjoy better online experience and get to where we need to go faster thanks to simple top-down semantic web services.

Read Full Post »

Social Graph & Beyond: Tim Berners-Lee’s Graph is The Next Level

Written by Richard MacManus / November 22, 2007 5:55 PM / 12 Comments

Tim Berners-Lee, inventor of the World Wide Web, today published a blog post about what he terms the Graph, which is similar (if not identical) to his Semantic Web vision. Referencing both Brad Fitzpatrick’s influential post earlier this year on Social Graph, and our own Alex Iskold’s analysis of Social Graph concepts, Berners-Lee went on to position the Graph as the third main “level” of computer networks. First there was the Internet, then the Web, and now the Graph – which Sir Tim labeled (somewhat tongue in cheek) the Giant Global Graph!

Note that Berners-Lee wasn’t specifically talking about the Social Graph, which is the term Facebook has been heavily promoting, but something more general. In a nutshell, this is how Berners-Lee envisions the 3 levels (a.k.a. layers of abstraction):

1. The Internet: links computers
2. Web: links documents
3. Graph: links relationships between people and/or documents — “the things documents are about” as Berners-Lee put it.

The Graph is all about connections and re-use of data. Berners-Lee wrote that Semantic Web technologies will enable this:

“So, if only we could express these relationships, such as my social graph, in a way that is above the level of documents, then we would get re-use. That’s just what the graph does for us. We have the technology — it is Semantic Web technology, starting with RDF OWL and SPARQL. Not magic bullets, but the tools which allow us to break free of the document layer.”

Sir Tim also notes that as we go up each level, we lose more control but gain more benefits: “…at each layer — Net, Web, or Graph — we have ceded some control for greater benefits.” The benefits are what happens when documents and data are connected – for example being able to re-use our personal and friends data across multiple social networks, which is what Google’s OpenSocial aims to achieve.

What’s more, says Berners-Lee, the Graph has major implications for the Mobile Web. He said that longer term “thinking in terms of the graph rather than the web is critical to us making best use of the mobile web, the zoo of wildy differing devices which will give us access to the system.” The following scenario sums it up very nicely:

“Then, when I book a flight it is the flight that interests me. Not the flight page on the travel site, or the flight page on the airline site, but the URI (issued by the airlines) of the flight itself. That’s what I will bookmark. And whichever device I use to look up the bookmark, phone or office wall, it will access a situation-appropriate view of an integration of everything I know about that flight from different sources. The task of booking and taking the flight will involve many interactions. And all throughout them, that task and the flight will be primary things in my awareness, the websites involved will be secondary things, and the network and the devices tertiary.”


I’m very pleased Tim Berners-Lee has appropriated the concept of the Social Graph and married it to his own vision of the Semantic Web. What Berners-Lee wrote today goes way beyond Facebook, OpenSocial, or social networking in general. It is about how we interact with data on the Web (whether it be mobile or PC or a device like the Amazon Kindle) and the connections that we can take advantage of using the network. This is also why Semantic Apps are so interesting right now, as they take data connection to the next level on the Web.

Overall, unlike Nick Carr, I’m not concerned whether mainstream people accept the term ‘Graph’ or ‘Social Graph’. It really doesn’t matter, so long as the web apps that people use enable them to participate in this ‘next level’ of the Web. That’s what Google, Facebook, and a lot of other companies are trying to achieve.

Incidentally, it’s great to see Tim Berners-Lee ‘re-using’ concepts like the Social Graph, or simply taking inspiration from them. He never really took to the Web 2.0 concept, perhaps because it became too hyped and commercialized, but the fact is that the Consumer Web has given us many innovations over the past few years. Everything from Google to YouTube to MySpace to Facebook. So even though Sir Tim has always been about graphs (as he noted in his post, the Graph is essentially the same as the Semantic Web), it’s fantastic he is reaching out to the ‘web 2.0’ community and citing people like Brad Fitzpatrick and Alex Iskold.

Related: check out Alex Iskold’s Social Graph: Concepts and Issues for an overview of the theory behind Social Graph. This is the post Tim Berners-Lee referenced. Also check out Alex’s latest post today: R/WW Thanksgiving: Thank You Google for Open Social (Or, Why Open Social Really Matters).

Read Full Post »

Semantic Travel Search Engine UpTake Launches

Written by Josh Catone / May 14, 2008 6:00 AM / 8 Comments

According to a comScore study done last year, booking travel over the Internet has become something of a nightmare for people. It’s not that using any of the booking engines is difficult, it’s just that there is so much information out there that planning a vacation is overwhelming. According to the comScore study, the average online vacation plan comes together through 12 travel-related searches and visits to 22 different web sites over the course of 29 days. Semantic search startup UpTake (formerly Kango) aims to make that process easier.

UpTake is a vertical search engine that has assembled what it says is the largest database of US hotels and activities — over 400,000 of them — from more than 1,000 different travel sites. Using a top-down approach, UpTake looks at its database of over 20 million reviews, opinions, and descriptions of hotels and activities in the US and semantically extracts information about those destinations. You can think of it as Metacritic for the travel vertical, but rather than just arriving at an aggregate rating (which it does), UpTake also attempts to figure out some basic concepts about a hotel or activity based on what it learns from the information it reads. Things such as, is the hotel family friendly, would it be good for a romantic getaway, is it eco friendly, etc.

“UpTake matches a traveler with the most useful reviews, photos, etc. for the most relevant hotels and activities through attribute and sentiment analysis of reviews and other text, the analysis is guided by our travel ontology to extract weighted meta-tags,” said President Yen Lee, who was co-founder of the CitySearch San Francisco office and a former GM of Travel at Yahoo!

What UpTake isn’t, is a booking engine like Expedia, a meta price search engine like Kayak, or a travel community. UpTake is strictly about aggregation of reviews and semantic analysis and doesn’t actually do any booking. According to the company only 14% of travel searches start at a booking engine, which indicates that people are generally more interested in doing research about a destination before trying to locate the best prices. Many listings on the site have a “Check Rates” button, however, which gets hotel rates from third party partner sites — that’s actually how UpTake plans to make money.

The way UpTake works is by applying its specially created travel ontology, which contains concepts, relationships between those concepts, and rules about how they fit together, to the 20 million reviews in its database. The ontology allows UpTake to extract meaning from structured or semi-structured data by telling their search engine things like “a pool is a type of hotel amenity and kids like pools.” That means hotels with pools score some points when evaluating if a hotel is “kid friendly.” The ontology also knows, though, that a nude pool might be inappropriate for kids, and thus that would take points away when evaluating for kid friendliness.

A simplified example ontology is depicted below.

In addition to figuring out where destinations fit into vacation themes — like romantic getaway, family vacation, girls getaway, or outdoor — the site also does sentiment matching to determine if users liked a particular hotel or activity. The search engine looks for sentiment words such as “like,” “love,” “hate,” “cramped,” or “good view,” and knows what they mean and how they relate to the theme of the hotel and how people felt about it. It figures that information into the score it assigns each destination.


Yesterday, we looked at semantic, natural language processing search engine Powerset and found in some quick early testing that the results weren’t that much different than Google. “If Google remains ‘good enough,’ Powerset will have a hard time convincing people to switch,” we wrote. But while semantic search may feel rather clunky for the broader global web, it makes a lot of sense in specific verticals. The ontology is a lot more focused and the site also isn’t trying to answer specific questions, but rather attempting to semantically determine general concepts, such as romanticness or overall quality. The upshot is that the results are tangible and useful.

I asked Yen Lee what UpTake thought about the top-down vs. the traditional bottom-up approach. Lee told me that he thinks the top-down approach is a great way to lead into the bottom-up Semantic Web. Lee thinks that top-down efforts to derive meaning from unstructured and semi-structured data, as well as efforts such as Yahoo!’s move to index semantic markup, will provide an incentive for content publishers to start using semantic markup on their data. Lee said that many of UpTake’s partners have already begun to ask how to make it easier for the site to read and understand their content.

Vertical search engines like UpTake might also provide the consumer face for the Semantic Web that can help sell it to consumers. Being able to search millions of reviews and opinions and have a computer understand how they relate to the type of vacation you want to take is the sort of palpable evidence needed to sell the Semantic Web idea. As these technologies get better, and data becomes more structured, then we might see NLP search engines like Powerset start to come up with better results than Google (though don’t think for a minute that Google would sit idly by and let that happen…).

What do you think of UpTake? Let us know int he comments below.

Read Full Post »

Older Posts »