W3C

Resource Description Framework (RDF):
Concepts and Abstract Data Model

W3C Working Draft 5 August 2002

This version:
http://www.ninebynine.org/wip/RDF-basics/2002-08-05/Overview.htm
Latest version:
http://www.ninebynine.org/wip/RDF-basics/Current/Overview.htm
Previous version:
http://www.ninebynine.org/wip/RDF-basics/2002-07-29/Overview.htm
Editors:
Graham Klyne (Clearswift and Nine by Nine)
Jeremy Carroll (Hewlett Packard Labs)
Series editor:
Brian McBride (Hewlett Packard Labs)
[[[Confirm series editor is OK with this]]]

Abstract

The Resource Description Framework (RDF) is a data format for representing metadata about Web resources, and other information. This document defines the abstract graph syntax on which RDF is based, and which serves to link its XML serialization to its formal semantics. It also describes some other technical aspects of RDF that do not fall under the topics of formal semantics, XML serialization syntax or RDF schema and vocabulary definitions (which are each covered by a separate document in this series). These include: discussion of design goals, meaning of RDF documents, key concepts, character normalization and handling of URI references.

Status of this Document

This is a W3C RDF Core Working Group Working Draft produced as part of the W3C Semantic Web Activity (Activity Statement).

This document is being released for review by W3C Members and other interested parties to encourage feedback and comments, especially with regard to how the changes affect existing implementations and content.

This is a public W3C Working Draft and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use W3C Working Drafts as reference material or to cite as other than "work in progress". A list of current W3C Recommendations and other technical documents can be found at http://www.w3.org/TR/.

Comments on this document are invited and should be sent to the public mailing list www-rdf-comments@w3.org. An archive of comments is available at http://lists.w3.org/Archives/Public/www-rdf-comments/.

Table of contents


1. Introduction

The Resource Description Framework (RDF) is a data format for representing metadata about Web resources, and other information. This document defines the abstract graph syntax on which RDF is based, and which serves to link its XML serialization to its formal semantics. It also describes some other technical aspects of RDF that are not covered by separate normative documents in this series.

The normative documentation of RDF falls broadly into the following areas:

[[[NOTE: it is anticipated that some of the material in this document may be moved to other documents as part of the document review process.]]]

In section 2, some background to the design goals and rationale of RDF is presented. There is also some discussion of the intended implications of publishing an RDF document (section 2.3).

RDF is based on a graph syntax, which is typically serialized using XML. This graph syntax captures the fundamental structure of RDF, independently of any serialization syntax that may be used. The formal semantics of RDF are defined in terms of the graph syntax. The graph syntax is defined in section 3 of this document

Section 4 presents some other technical issues that don't clearly fall into any of the more explicit areas noted above.

2. RDF background, rationale and concepts

RDF uses well established ideas from various data and knowledge representation communities, with recognizable relationships to Conceptual Graphs, logic-based knowedge representation, frames, and relational databases [Sowa] [CG] [KIF] [Hayes] [Luger] [Gray].

RDF builds on XML, which provides a syntactic framework for representing documents and other information. It has a simple graph-based data model and formal semantics with a rigorously defined notion of entailment, which in turn provides a basis for well founded deductions in RDF data.

The real value of RDF comes not so much from any single application, but from the possibilities for sharing data between applications. The value of information thus increases as it becomes accessible to more and more applications across the entire Internet.

2.1 Motivation

The development of RDF has been motivated by the following uses, among others:

2.2 Design goals

The design of RDF is intended to meet the following goals:

2.2.1 A simple data model

RDF has a simple data model that is easy for applications to process and manipulate. The data model is independent of any specific serialization syntax.

NOTE: the term "model" used here in "data model" has a completely different sense to its use in the term "model theory". See the RDF model theory specification [RDF-SEMANTICS] or a textbook on logical semantics (e.g., [HUNTER] [DAVIS]) for more information about what logicians call "model theory".

2.2.2 Formal semantics and well-founded inference

RDF has a formal semantics which provides a sound basis for reasoning about the meaning of an RDF expression. In particular, it supports rigorously defined notions of entailment which provide a basis for defining reliable rules of inference in RDF data.

2.2.3 Extensible URI-based vocabulary

The vocabulary is fully extensible, being based on URIs with optional fragment identifiers (URI references, or URIrefs). URIrefs are used for naming all kinds of things in RDF data. The only other kind of label that appears in RDF data is a literal string.

[[[Review this on resolution of datatypes issues]]]

2.2.4 XML-based syntax

RDF has a recommended XML serialization form [RDF-SYNTAX], which can be used to encode the data model for exchange of information between applications.

2.2.5 Use XML schema datatypes

RDF can be used with XML schema datatypes [XML-SCHEMA2], thus assisting the exchange of information between RDF and other XML applications.

[[[Review this on resolution of datatypes issues]]]

2.2.6 Anyone can say anything about anything

To facilitate operation at Internet scale, RDF is an open-world framework that allows anyone to say anything about anything. In general, it is not assumed that all information about any topic is available. A consequence of this is that RDF cannot prevent anyone from making nonsensical or inconsistent assertions, and applications that build upon RDF must find ways to deal with conflicting sources of information. (This is where RDF departs from the XML approach to data representation, which is generally quite prescriptive and aims to present an application with information that is well-formed and complete for the application's needs.)

2.2.7 Universal expression of ground facts

Through its use of extensible URI-based vocabularies, RDF aims to provide for universal expression of ground facts; i.e. assertions of specific properties about specific named things.

RDF itself does not provide the machinery of inference, but provides the raw data upon which such machinery can operate. Other work is looking for ways to build more expressive expressions on the basic capabilities of the RDF core language.

2.2.8 A basis for binding agrements

RDF is intended to convey assertions that are meaningful to the extent that they may, in appropriate contexts, be used to express the terms of binding agreements.

This goal is explored further in section 2.3 below.

2.3 Meaning of RDF documents

The RDF specification emphasizes the formal structure and meaning of RDF. But there is also a social dimension that is easily overlooked when dealing with such formal aspects.

2.3.1 Formal semantics

RDF is a language designed to support the Semantic Web, in much the same way that HTML is the language that supports the original Web. The Semantic Web aims for data to be shared and processed by automated tools as well as by people. To serve this purpose, certain meanings of RDF statements must be defined in a very precise manner; this is provided by the RDF Model Theory [RDF-SEMANTICS].

Model-theoretic semantics assumes that a language refers to a 'world', and describes the minimal conditions that such world must satisfy in order to assign an appropriate meaning for every expression in the language. A particular world is called an interpretation, so that model theory might be better called 'interpretation theory'. The idea is to provide an abstract, mathematical account of the properties that any such interpretation must have, making as few assumptions as possible about its actual nature or intrinsic structure. The RDF model theory is couched in the language of set theory simply because that is the normal language of mathematics - for example, the model theory assumes that names denote things in a set IR called the 'universe' - but the use of set- theoretic language is not supposed to imply that the things in the universe are set-theoretic in nature.

The chief utility of such a semantic theory is not to suggest any particular processing model, or to provide any deep analysis of the nature of the things being described by the language (in our case, the nature of resources), but rather to provide a technical tool to analyze the semantic properties of proposed operations on the language; in particular, to provide a way to determine when they preserve meaning.

The RDF model theory treats RDF as a simple assertional language, in which each triple makes a distinct assertion, and the meaning of any triple is not changed by adding other triples. Based on the semantics defined in the model theory, it is simple to translate an RDF graph into a logical expression with essentially the same meaning.

2.3.2 Social meaning

RDF/XML documents, i.e. encodings of RDF graphs, can be used to make representations of claims or assertions about the 'real' world. RDF graphs may be asserted to be true, and such an assertion should be understood to carry the same social import and responsibilities as an assertion in any other format. A combination of social (e.g. legal) and technical machinery (protocols, file formats, publication frameworks) provide the contexts that fix the intended meanings of the vocabulary of some piece of RDF, and which distinguish assertions from other uses (e.g. citations, denals or illustrations).

[[[This needs reviewing...]]]

For example, a media type, application/rdf+xml [RDF-MIME-TYPE] is being registered for indicating the use of RDF/XML that might be published with the intent of being such an assertional representation (as distinguished from other XML or text that may just happen to look like RDF assertions).

2.3.3 Interaction between social and formal meaning

[[[Needs more work; adapt example by PatH? Merge with next section?]]]

Using RDF, 'received meaning' can be characterized as the social meaning of any logical consequences.

If you publish a graph G and G logically entails G', and we interpret G' using the same social conventions that everyone agrees could be reasonably used to interpret G, then you are asserting that content of G' as well.

Human publishers of RDF content commit themselves to the mechanically-inferred social obligations. The machines doing the inferences aren't expected to know about all these social conventions and obligations.

The social conventions used to interpret a graph may include assumed truths, for which no logical derivation is available, and socially accepted consequences whose rules of deduction are embedded in arbitrary decision-making processes.

Semantic web vocabulary gains currency through use, so also do semantic web deductions have force through social acceptance. Semantic web deduction operates in a combination of logical and social (non-logical) dimensions.

To support logical entailments, formal RDF meaning is based on a model theory (see section 2.3.1). The notion of truth is crucial: a possible world may correspond to some RDF if and only if the RDF statement is true in that world.

The RDF core language provides a way to make simple formal assertions, with no machinery for formalizing allowable inferences. Inferences are performed by processes, embedded in software implementations, whose validity is not formally demonstrable, and must be assumed or trusted to be socially acceptable. It is expected that semantic web languages layered on RDF will give formal expression to allowable inferences, thus to allow provable deductions by generic software modules to replace individual ad-hoc implementations.

2.3.4 Implications of asserting RDF

When an RDF graph is asserted in the web, its publisher is saying something about their view of the world. (The mechanism for deciding whether or not a graph is asserted is not defined here, but it is presumed that the publisher's intent will be clear in some way.)

When a user invokes an application, there is also a social and technical context of invocation that determines some set of RDF assertions that will be assumed to be true: the application itself, and any RDF files that are passed to it. Garbage-in, garbage-out applies: if the initial assumed facts are wrong or meaningless, the results will have little value. No specfic mechanisms for deciding or evaluating the validity of any such assertions are defined here.

Noting that there is no single human opinion about the truth of some statements, the graph may further contain commentary for human interpreters to indicate the realm of human interpretation that should be applied. This means a graph may contain "defining information" that is opaque to logical reasoners. This information may be used by human interpreters of RDF informaton, or programmers writing software to perform specialized forms of deduction in the Semantic Web.

2.4 RDF concepts

RDF uses the following key concepts:

2.4.1 Graph data model

The underlying structure of any RDF expression is a directed labelled graph (or multigraph), which consists of nodes and labelled directed arcs that link pairs of nodes (these notions are defined more formally in section 3). The formal semantics for RDF is defined in terms of this graph syntax. An RDF expression is sometimes called an RDF graph. The graph can conveniently be represented as a set of triples, where each triple contains two node labels and an arc label:

Each arc corresponds to a startement that asserts a relationship between the nodes that it links. The meaning of an RDF graph is the conjunction (i.e. logical AND) of all the statements that it contains.

2.4.2 URI-based vocabulary

Nodes in an RDF graph are labelled with URIs with optional fragment identifiers (URI references, or URIrefs), literal strings, or nothing at all. Arcs are labelled with URIrefs. (See [URIS], section 4, for a description of URI reference forms, noting that relative URIs are not used in an RDF graph. See also section 3.1.)

The label on a node indicates what that node is meant to represent. The label on an arc names the relationship that is asserted to hold between the nodes connected by that arc. Some URIrefs may indicate web resources, and a node thus labelled is presumed to denote that resource. Other URIrefs may represent abstract ideas or values rather than a retreivable Web resource. RDF thus leverages the universal naming space of URIs [URIS].

2.4.3 XML serialization syntax

RDF has a specific serialization syntax based on XML [RDF-SYNTAX].

Only the XML syntax is normatively specified and recommended for use to exchange information between Internet applications: other syntaxes for RDF graphs are possible, and may be widely used (e.g. [NOTATION3]), but are not covered by this recommendation.

2.5 RDF core URI vocabulary and namespaces

RDF uses URIs to label resources and properties. Certain URIs are reserved for use by RDF, and may not be used for any purpose not sanctioned the RDF specifications. Specifically, URIs with the following leading substrings are reserved for RDF core vocabulary:

Used with the R