Semantic Web - draft 0.28

Purpose

This document describes my vision of semantic/content-aware/intelligent Web, and its development from simple meta-tag processing to globally distributed AI services. The text is gradually taking shape while I am collecting thoughts and studying the resources.

The Web is probably the richest information repository in human history, but most of its information is passive and unstructured. The Web doesn't know what it carries and for what purpose, and the users cannot specify what they want from it. There are some sites that use structured information storage and queries, but they are just little islands of order in the chaotic sea of information, not communicating to each other.

Since 1995, there started appearing various proposals for meta-data representation and communication standards, and other services and tools that may eventually merge into the global Semantic Web. Hopefully, in the next few years we will see universal adoption of open standards for representation and sharing of meta-information.

The Web should be aware of the content and purpose of its documents and links, and interests of its users, and make the best use of all encoded knowledge. Open semantic standards and communication protocols will allow creation of various services for knowledge gathering, storage, and distribution, as well as user-friendly client-side utilities that would communicate with these services and provide intelligent content selection, processing, and representation functions.

I expect that the next generation of information services will do for Web semantics what HTML and HTTP have done for its communication layer, that is to build a foundation for a global, intelligent, reactive knowledge exchange system.

In this paper, I will attempt to review existing efforts and proposals, suggest some additional areas for development, discuss perspectives of distributed knowledge-processing systems, and explore technological and organizational efforts necessary for a coordinated implementation of this vision.

It may not be necessary to formulate these ideas as a proposal. I am quite sure that a system like this is as invevitable a step in the development of a civilization as creation of communication infrastructure. Most of the needed technologies are already in various stages of development, and a realization of a system similar to one suggested here seems just a matter of time, whether I or anybody else comes up with such a proposal or not. However, implementation of technological inevitabilities needs visionary facilitation. Without a good vision, system developers tend to make stupid and costly mistakes, such as DOS 640K memory limit, year 2000 problem, or the campaign against "the commercialization of the Net". With limited interest scope, the participants either attempt to take proprietory control over personal data and encoding standards, or steer the development in the direction of their particular interests. In the result, the real process - the development of the global body of knowledge - gets distorted and slows down.

Semantic Standards

This is a list of some directions in development of semantic standards.

Structured descriptions of items

Items may include people, documents, events, things for sale, organizations, etc.

A description of a person may include: name, address, gender, e-mail, home page, date of birth. Open Profile Standard was just introduced for this purpose. Another example of an existing personal profile standard is the Geek code. The Geek Code probably would have been a lot more successful if somebody developed an interface for translating between this code and English, and some software for matching people based on this code.

A description of an event may include: type of an event (concert, conference, etc.), start and end times, location, price, attendance conditions. See, for example, the event entry form from Events directory.

A description of an item for sale may include: item description, owner, price, offer expiration date.

Practically, at this point, there is a huge number of description forms for all kinds of items, with some meta-services, such as Submit-It for Web pages, providing translations to them.

Relations between items

Relations between documents: translation, response, table of contents, copyright statement, update, review, etc. HTML specification contains the description of syntax for describing documents relations, but doesn't really define standards.

Some suggestions on link types are available from W3C.

Relations between people and companies may be: employee, owner, member, founder, supporter, etc. - with descriptions of roles, periods of involvement, etc.

Relations between people may be: friend, spouse, employer, teacher, client,... Six Degrees website attempts to collect these relations in a proprietory database.

Marked document elements

Item ontologies

Agents' interests descriptions

Besides descriptions of passive resources, many agents on the Web may need to specify what kind of resources they are looking for, and how they would like to see them delivered. Specialized information banks may be looking for new records of certain types, such as new restaurants in Boston, or PC parts for sale. People may be waiting for important news on their favorite topics, or movie recommendations from their friends. Web pages may be waiting for changes in their links, so they could update them in real time. Request brokers may be waiting for new request descriptions.

In all cases, these requests should be described in formalized formats, together with specifications of desired formats of expected data and locations of agents that should receive it.

A notable effort in this direction is represented by RDM (Resource Description Messages) - a mechanism to discover and retrieve metadata about network-accessible resources. It is based on Harvest's Summary Object Interchange Format (SOIF) - a syntax for transmitting resource descriptions and other kinds of structured objects.

Value-added services

Distributed Artificial Intelligence

Also, availability of vast amounts of information about people, things and events may allow to reduce efforts of testing of many hypotheses about people's social and economic behavior and health conditions from long and expensive research programs to a few simple queries. All manual queries entered into such services may be used by the knowledge processing systems as suggestions on which items may be causally or semantically related.

Eventually, we may see a wide variety of networked knowledge processing servers collecting and generalizing data in their own areas and cooperating with each other for "interdisciplinary" problem solving, first with direct human involvement, and then incresingly on their own.

Some suggestions on how to implement a distributed reasoning system using multiple specialized copies of CYC and KQML for communication between them, are available at The Cycic Friends Network

Another interesting solution is represented by Agent Communication Language (ACL) developed within the ARPA Knowledge Sharing Effort. ACL has a vocabulary, an "inner language" called KIF (Knowledge Interchange Format), and an "outer" language called KQML (Knowledge Query and Manipulation Language). An ACL message is a KQML expression in which the arguments are terms or sentences in KIF formed from words in the ACL vocabulary.

References

I am not trying to compile a complete set of projects and materials here, just to list some resources I found most interesting. Many of them have links to other materials.

Organizations

ARPA-sponsored Intelligent Integration of Information (I^3) program
World-Wide Web Consortium
CYCorp - developer of CYC, consensus commonsense knowledge base
Stanford Knowledge Systems Lab (KSL)
Information Society Project Office of the European Commission.
International Paleopsychology Group - studies of the history of the knowledge distribution systems in nature.

Proposed Standards

Meta Content Framework references
W3C spec for HTML links
Dublin core Standard resource types
SHOE - a set of Simple HTML Ontology Extensions defining semantic document annotations.
Universal Resource names
XML - eXtensible Mark-up Language
AML -Agent Markup Language
KIF - Knowledge Interchange Format
Knowledge Sharing Effort at UMBC AgentWeb
KQML. Knowledge Query and Manipulation Language is a language and protocol for exchanging information and knowledge developed as apart of the ARPA Knowledge Sharing Effort.
Summary Object Interchange Format (SOIF) - syntax for transmitting resource descriptions and other kinds of structured objects.
Knowledge Representation Specification Language (KRSL) is used by the DARPA/Rome Laboratory Planning and Scheduling Initiative for specifying shared ontologies and includes built-in ontologies for time, measurement, resource, and planning operations.

Tools and services

Publications and reference materials

MCF Tutorial
XML, Java, and the future of the Web by Jon Bosak.
UMBC AgentNews Webletter
Standardization and the Global Information Society COM (96)359 Communication from the Commission to the Council and the Parliament on "Standardization and the Global Information Society : The European Approach" - document from the Information Society Project Office of the European Commission.
A Dictionary of HTML Meta-tags
A proposal for Metadata operations
Introduction into metadata architecture by Tim Berners-Lee.
Services and Metadata Representation for Distributed Information Discovery by Mark A. Sheldon, Ron Weiss, Bienvenido Vйlez, and David K. Gifford.

Theoretical essays

In the course of human history, technology has taken over storage, transmission and processing of materials, energy and information. The next stage of technology is going to assume an increasing role in shaping the global flows of knowledge. The materials listed below suggest some ideas on how this process may go, and where it may lead.

Networking in the Mind Age - some ideas on long-term future of distributed intelligence.
Automated Collaborative Filtering and Semantic Transports
Global Brain mailing list
Seth Russell's AI Conjecture

Other

The @gency - a comprehensive list of resources on software agents.
Ontolingua server - an effort to develop technology to support collaborative construction and effective use of distributed large-scale repositories of highly expressive reusable ontologies. The server provides a distributed collaborative environment to browse, create, edit, modify, and use ontologies.
Seth Russell's AI links
Friedrich Hayek and followers - works on economy as a signaling system. Much of economic signaling goes through averaged price points though. Still, it led us to wonders of the market economy. With storage and processing of individual pieces of meaning we could do a lot more...
Moscow Libertarium attempts to extend the theory of economic signaling system, including secondary and derivative instruments, to social voting and decision-making schemes.
Project Aristotle(sm): Automated Categorization of Web Resources - overview of existing projects.
CRIT mediator - adds text-level and general backlinks to any document.
Fulckrum Knowledge Network

Semantic Web vision paper