Resource Description Framework (RDF):
Concepts and Abstract Data Model

Editors' Working Draft 29 July 2002

This version:: http://www.ninebynine.org/wip/RDF-basics/2002-07-29/Overview.htm
Latest version:: http://www.ninebynine.org/wip/RDF-basics/Current/Overview.htm
Previous version:: http://www.ninebynine.org/wip/RDF-basics/2002-07-25/Overview.htm
Editors:: Graham Klyne (Clearswift and Nine by Nine); Jeremy Carroll (Hewlett Packard Labs)
Series editor:: Brian McBride (Hewlett Packard Labs)

Abstract

The Resource Description Framework (RDF) is a data format for representing metadata about Web resources, and other information. This document defines the abstract graph syntax on which RDF is based, and which serves to link its XML serialization to its formal semantics. It also describes some other technical aspects of RDF that do not fall under the topics of formal semantics, XML serialization syntax or RDF schema and vocabulary definitions (which are each covered by a separate document in this series). These include: discussion of design goals, meaning of RDF documents, key concepts, character normalization and handling of URI references.

Status of this Document

The following text refers to the intended status of this document: it has not yet been approved by the RDF core working group or W3C director and has no status other than editors' working draft at this time.

This is a W3C RDF Core Working Group Working Draft produced as part of the W3C Semantic Web Activity (Activity Statement).

This document is being released for review by W3C Members and other interested parties to encourage feedback and comments, especially with regard to how the changes affect existing implementations and content.

This is a public W3C Working Draft and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use W3C Working Drafts as reference material or to cite as other than "work in progress". A list of current W3C Recommendations and other technical documents can be found at http://www.w3.org/TR/.

Comments on this document are invited and should be sent to the public mailing list www-rdf-comments@w3.org. An archive of comments is available at http://lists.w3.org/Archives/Public/www-rdf-comments/.

1. Introduction
2. RDF background, rationale and concepts
3. Graph syntax
4. Additional technical considerations
5. Acknowledgments
6. References
- 6.1 Normative References
- 6.2 Informational References
Appendix Y: Change log

1. Introduction

The Resource Description Framework (RDF) is a data format for representing metadata about Web resources, and other information. The normative documentation of RDF falls broadly into the following areas:

XML serialization syntax
Formal semantics
RDF vocabulary definition language (RDF schema)
Abstract graph syntax
Other technical details

This document addresses the last two of these items. The first three are covered by separate documents [RDF-SYNTAX] [RDF-SEMANTICS] [RDF-VOCABULARY].

In section 2, some background to the design goals and rationale of RDF is presented. There is also some discussion of the intended implications of publishing an RDF document (section 2.3).

RDF is based on a graph syntax, which is typically serialized using XML. This graph syntax captures the fundamental structure of RDF, independently of any serialization syntax that may be used. The formal semantics of RDF are defined in terms of the graph syntax. The graph syntax is defined in section 3 of this document

Section 4 presents a number of technical issues that don't clearly fall into any of the more explicit areas noted above.

2. RDF background, rationale and concepts

RDF uses well established ideas from various data and knowledge representation communities, with recognizable relationships to Conceptual Graphs, logic-based knowedge representation, frames, and relational databases [Sowa] [CG] [KIF] [Hayes] [Luger] [Gray].

RDF builds on XML, which provides a syntactic framework for representing documents and other information. It has a simple graph-based data model and formal semantics with a rigorously defined notion of entailment, which in turn provides a basis for well founded deductions in RDF data.

The real value of RDF comes not so much from any single application, but from the possibilities for sharing data between applications. The value of information thus increases as it becomes accessible to more and more applications across the entire Internet.

2.1 Motivation

The development of RDF has been motivated by the following uses, among others:

Web metadata: providing information about web resources and the systems that use them (e.g. content rating, capability descriptions, privacy preferences, etc.)
Applications that require open rather than constrained information formats (e.g. scheduling activities, describing organizational processes, annotation of Web resources, etc.)
To do for machine processable information (application data) what the World Wide Web has done for hypertext: to allow data to be processed outside the particular environment in which it was created, in a fashion that can work at Internet scale.
Interworking between applications: combining data from several applications to arrive at new information.
Automated processing of Web information by software agents: the Web is moving from having just human-readable information to being a world-wide network of cooperating processes. RDF provides a world-wide lingua franca for these processes.

2.2 Design goals

The design of RDF is intended to meet the following goals:

A simple data model
Formal semantics and well-founded inference
Extensible URI-based vocabulary
XML-based syntax
Use XML schema datatypes
Anyone can say anything about anything
A basis for legally binding agrements
Universal expression of ground facts

2.2.1 A simple data model

RDF has a simple data model that is easy for applications to process and manipulate. The data model is independent of any specific serialization syntax.

NOTE: the term "model" used here in "data model" has a completely different sense to its use in the term "model theory". See the RDF model theory specification [RDF-SEMANTICS] or a textbook on logical semantics (e.g., [HUNTER] [DAVIS]) for more information about what logicians call "model theory".

2.2.2 Formal semantics and well-founded inference

RDF has a formal semantics which provides a sound basis for reasoning about the meaning of an RDF expression. In particular, it supports rigorously defined notions of entailment which provide a basis for defining reliable rules of inference in RDF data.

2.2.3 Extensible URI-based vocabulary

The vocabulary is fully extensible, being based on URIs with optional fragment identifiers (URIrefs). URIrefs are used for naming all kinds of things in RDF data. The only other kind of label that appears in RDF data is a literal string.

2.2.4 XML-based syntax

RDF has an XML-based serialization form which, if used appropriately, allows a wide range of "ordinary" XML data to be interpreted as RDF [STRIPEDRDF].

2.2.5 Use XML schema datatypes

RDF can be used with XML schema datatypes [XML-SCHEMA2], thus assisting the exchange of information between RDF and other XML applications.

[[[Add comment here if goal not fully achieved]]]

2.2.6 Anyone can say anything about anything

To allow operation at Internet scale, RDF is an open-world framework that allows anyone to say anything about anything. In general, it is not assumed that all information about any topic is available. A consequence of this is that RDF cannot prevent anyone from making nonsensical or inconsistent assertions, and applications that build upon RDF must find ways to deal with conflicting sources of information. (This is where RDF departs from the XML approach to data representation, which is generally quite prescriptive and aims to present an application with information that is well-formed and complete for the application's needs.)

2.2.7 Universal expression of ground facts

Through its use of extensible URI-based vocabularies, RDF aims to provide for universal expression of ground facts; i.e. assertions of specific properties about specific named things.

RDF itself does not provide the machinery of inference, but provides the raw data upon which such machinery can operate. Other work is looking for ways to build more expressive expressions on the basic capabilities of the RDF core language.

2.2.8 A basis for binding agrements

RDF is intended to convey assertions that are meaningful to the extent that they may, in appropriate contexts, be used to express the terms of binding agreements.

This goal is explored further in section 2.3 below.

2.3 Meaning of RDF documents

The RDF specification emphasizes the formal structure and meaning of RDF. But there is also a social dimension that is easily overlooked when dealing with such formal aspects.

2.3.1 Formal semantics

[[[These words adapted from Primer 7.1]]]

RDF is a language designed to support the Semantic Web, in much the same way that HTML is the language that supports the original Web. The Semantic Web aims for data to be shared and processed by automated tools as well as by people. To serve this purpose, certain meanings of RDF statements must be defined in a very precise manner; this is provided by the RDF Model Theory [RDF-SEMANTICS].

Model-theoretic semantics assumes that a language refers to a 'world', and describes the minimal conditions that such world must satisfy in order to assign an appropriate meaning for every expression in the language. A particular world is called an interpretation, so that model theory might be better called 'interpretation theory'. The idea is to provide an abstract, mathematical account of the properties that any such interpretation must have, making as few assumptions as possible about its actual nature or intrinsic structure. The RDF model theory is couched in the language of set theory simply because that is the normal language of mathematics - for example, the model theory assumes that names denote things in a set IR called the 'universe' - but the use of set- theoretic language is not supposed to imply that the things in the universe are set-theoretic in nature.

The chief utility of such a semantic theory is not to suggest any particular processing model, or to provide any deep analysis of the nature of the things being described by the language (in our case, the nature of resources), but rather to provide a technical tool to analyze the semantic properties of proposed operations on the language; in particular, to provide a way to determine when they preserve meaning.

The RDF model theory treats RDF as a simple assertional language, in which each triple makes a distinct assertion, and the meaning of any triple is not changed by adding other triples. Based on the semantics defined in the model theory, it is simple to translate an RDF graph into a logical expression with essentially the same meaning.

2.3.2 Social meaning

[[[Adapted words from DanBri/PatHayes]]]

RDF/XML documents, i.e. encodings of RDF graphs, can be used to make representations of claims or assertions about the 'real' world. RDF graphs may be asserted to be true, and such an assertion should be understood to carry the same social import and responsibilities as an assertion in any other format. A combination of social (e.g. legal) and technical machinery (protocols, file formats, publication frameworks) provide the contexts that fix the intended meanings of the vocabulary of some piece of RDF, and which distinguish assertions from other uses (e.g. citations, denals or illustrations).

A media type, application/rdf+xml is being registered for indicating the use of RDF/XML as an assertional representation in this way [RDF-MIME-TYPE].

2.3.3 Interaction between social and formal meaning

[[[Adapted words from PatHayes/Jos de Roo]]]

Using RDF, 'received meaning' can be characterized as the social meaning of any logical consequences.

If you publish a graph G and G logically entails G', and we interpret G' using the same social conventions that everyone agrees could be reasonably used to interpret G, then you are asserting that content of G' as well.

Human publishers of RDF content commit themselves to the mechanically-inferred social obligations. The machines doing the inferences aren't expected to know about all these social conventions and obligations.

The social conventions used to interpret a graph may include assumed truths, for which no logical derivation is available, and socially accepted consequences whose rules of deduction are embedded in arbitrary decision-making processes.

Semantic web vocabulary gains currency through use, so also do semantic web deductionshave force through social acceptance. Semantic web deduction operates in a combination of logical and social (non-logical) dimensions.

To support logical entailments, formal RDF meaning is based on a model theory (see section 2.3.1). The notion of truth is crucial: a possible world may correspond to some RDF if and only if the RDF statement is true in that world.

The RDF core language provides a way to make simple formal assertions, with no machinery for formalizing allowable inferences. Inferences are performed by processes, embedded in software implentations, whose validity is not formally demonstrable, and must be assumed or trusted to be socially acceptable. It is expected that semantic web languages layered on RDF will give formal expression to allowable inferences, thus to allow provable deductions by generic software modules to replace individual ad-hoc implemenations.

2.3.4 Implications of asserting RDF

When an RDF graph is asserted in the web, its publisher is saying something about their view of the world. (The mechanism for deciding whether or not a graph is asserted is not defined here, but it is presumed that the publisher's intent will be clear in some way.)

When a user invokes an application, there is also a social and technical context of invocation that determines some set of RDF assertions that will be assumed to be true: the application itself, and any RDF files that are passed to it. Garbage-in, garbage-out applies: if the initial assumed facts are wrong or meaningless, the results will have little value. No specfic mechanisms for deciding or evaluating the validity of any such assertions are defined here.

Noting that there is no single human opinion about the truth of some statements, the graph may further contain commentary for human interpreters to indicate the realm of human interpretation that should be applied. This means a graph may contain "defining information" that is opaque to logical reasoners. This information may be used by human interpreters of RDF informaton, or programmers writing software to perform specialized forms of deduction in the Semantic Web.

2.4 RDF concepts

RDF uses the following key concepts:

Graph data model
URI-based vocabulary
XML serialization syntax

2.4.1 Graph data model

The underlying structure of any RDF expression is a directed labelled graph (or multigraph), which consists of nodes and labelled directed arcs that link pairs of nodes (these notions are defined more formally in section 3). The formal semantics for RDF is defined in terms of this graph syntax. An RDF expression is sometimes called an RDF graph. The graph can conveniently be represented as a set of triples, where each triple contains two node labels and an arc label:

Each arc corresponds to a startement that asserts a relationship between the nodes that it links. The meaning of an RDF graph is the conjunction (i.e. logical AND) of all the statements that it contains.

2.4.2 URI-based vocabulary

Nodes in an RDF graph are labelled with URIs with optional fragment identifiers (URI references, or URIrefs), literal strings, or nothing at all. Arcs are labelled with URIrefs. (See [URIS], section 4, for a description of URI reference forms, noting that relative URIs are not used in an RDF graph. See also section 3.1.)

The label on a node indicates what that node is meant to represent. The label on an arc names the relationship that is asserted to hold between the nodes connected by that arc. Some URIrefs may indicate web resources, and a node thus labelled is presumed to denote that resource. Other URIrefs may represent abstract ideas or values rather than a retreivable Web resource. RDF thus leverages the universal naming space of URIs [URIS].

2.4.3 XML serialization syntax

RDF has a specific serialization syntax based on XML. There are several ways in which a given RDF graph can be represented in XML: these various forms allow RDF to be represented in ways that are amenable to specific XML applications. In this way, XML application data can easily be designed to be accessible to generic RDF processors [XML-AS-RDF].

Only the XML syntax is normatively specified and recommended for use to exchange information between Internet applications: other syntaxes for RDF graphs are possible, and may be widely used (e.g. [NOTATION3]), but are not covered by this recommendation.

2.5 RDF core URI vocabulary and namespaces

RDF uses URIs to label resources and properties. Certain URIs are reserved for use by RDF, and may not be used for any purpose not sanctioned the RDF specifications. Specifically, URIs with the following leading substrings are reserved for RDF core vocabulary:

http://www.w3.org/1999/02/22-rdf-syntax-ns# (conventionally associated with prefix rdf:)
http://www.w3.org/2000/01/rdf-schema# (conventionally associated with prefix rdfs:)

Used with the RDF/XML serialization, these URI prefix strings correspond to XML namespaces [XML-NS] associated with the RDF core vocabulary terms.

NOTE: these namespace URIs are the same as those used in earlier RDF documents [RDF-MS] [RDF-SCHEMA]. The URIs have not been updated because the working group feels its work has been to clarify the earlier work rather than to change it.

Vocabulary terms in the rdf: namespace are listed in section 3.4 [[[check this]]] of the RDF syntax specification [RDF-SYNTAX].

Vocabulary terms defined in the rdfs: namespace are defined [[[where?]]] in the RDF schema vocabulary specification [RDF-VOCABULARY].

3. Graph syntax

This section defines the RDF graph syntax. The RDF graph is sometimes referred to as the (data) model of RDF (see the RDF Primer [RDF-PRIMER], and RDF Model & Syntax [RDF-MS]). In brief, the RDF graph is a directed graph with labelled edges and partially labelled nodes.

A goal of this section is the precise definition of equality between RDF graphs. This benefits interoperability (two conformant implementations are more likely to be practically interoperable if they have a precise conception of the way in which they are the same). It is required for the specification of the RDF Test Cases [RDF-TESTS], which depend on testing equality of RDF graphs for their execution. It is required by the RDF Model Theory [RDF-SEMANTICS] which assigns the same meaning to any pair of equal RDF graphs.

Note: Many RDF applications and frameworks do not need to implement RDF graph equality. They do need to respect equality when assigning meaning to RDF graphs.

The specification of the RDF graph commences with the labels used in the graph, which can be URI references, string literals, or XML literals; equality is defined for each. It then proceeds to describing arcs (triples), a complete graph and graph equality.

[[[This section, particularly how nodes and node labels are handled, is not completely in sync with the current Model Theory WD.]]]

3.1 URI References

A URI Reference Label within an RDF graph (an RDF URI reference) is a Unicode string [UNICODE] that:

is in Normal Form C [NFC] and
would produce a US-ASCII string that is an absolute URI reference in the form defined by [URIS], as modified by [RFC-2732], when:
- encoded in UTF-8, and
- with disallowed characters escaped according to the percent escaping algorithm below

The disallowed characters that must be %-escaped include all non-ASCII characters, the excluded characters listed in Section 2.4 of [URIS], except for the number sign (#) and percent sign (%) characters and the square bracket characters re-allowed in [RFC-2732].

Disallowed characters must be escaped as follows:

Each disallowed character is converted to UTF-8 [RFC-2279] as one or more bytes.
Any bytes corresponding to a disallowed character are escaped with the URI escaping mechanism (that is, converted to %HH, where HH is the hexadecimal notation of the byte value).
The original character is replaced by the resulting character sequence.

Two RDF URI references are equal if and only if they compare as equal, character by character, as Unicode strings. A URI reference label is not equal to a string literal label or an XML literal label.

Note: RDF URI references are compatible with the anyURI datatype as defined by XML schema datatypes [XML-SCHEMA2], constrained to be an absolute rather than a relative URI reference, and constrained to be in Unicode Normal Form C [NFC] (for compatibility with [CHARMOD]).

See the following test cases, per [RDF-TESTS]:

3.2 RDF Literals

An RDF literal is either an XML literal or a string literal.

Two RDF literals are equal if and only if they are either both XML literals and equal or both string literals and equal.

3.2.1 String Literals

A string literal label in an RDF graph is composed of a Unicode string [UNICODE] that is in Normal Form C [NFC], and a language identifier that is either null or as specified below.

Two string literals are equal if both components are equal. The Unicode string components are compared on a character by character basis. The language tag components are equal if both are null or if both are defined and equal as language identifiers.

Allowable language identifiers are the legal values for xml:lang as specified by section 2.12, Language Identification, in [XML], or null. Equality of language identifiers (as specified in [RFC-3066]) is defined by case insensitive character by character comparison.

Note: This direct comparison between language identifiers is appropriate for the purpose of defining equality between RDF graphs, but is linguistically naive. [RFC-3066] suggests more advanced comparison techniques.

Note: Literals beginning with a composing character (as defined by [CHARMOD]) are allowed however they may cause interoperability problems, particularly with XML version 1.1 [XML 1.1].

See the following test cases, per [RDF-TESTS]:

[[[Subject to WG disposition of test cases]]]

3.2.2 XML Literals

Within an RDF graph, an XML literal is a Unicode [UNICODE] string paired with a language identifier. The string is well-balanced, self-contained XML element content [XML].

An XML literal, with non-null language identifier, can be used to form an XML document by concatenating the five strings:

"<tag xml:lang='"
the language identifier of the XML literal
"'>"
the Unicode string of the XML literal
"</tag>"

The resulting Unicode string is then encoded in UTF-8.

When the language identifier is null, the corresponding XML document is formed by enclosing the Unicode string of the XML literal with "<tag>" and "</tag>" and encoding the resulting string in UTF-8.

No escaping is applied in either process. The choice of tag is arbitrary.

This resulting XML document corresponding to the XML literal is a well-formed XML document [XML] that also conforms to XML Namespaces [XML-NS].

Note: If compatibility with XML version 1.1 is desired, then XML literals in RDF graphs must be restricted to those that are fully normalized according to [XML 1.1].

The exclusive canonicalization of an XML literal is formed by:

Forming the XML document corresponding to the XML literal as above.
Taking the exclusive canonicalization without comments [XC14N] of the element content of the root element of the document.

If two XML literals are equal then:

The language identifiers are both null or both are defined and equal as language identifiers, per [RFC-3066].
The exclusive canonicalizations of the XML literal are equal UTF-8 strings, octet by octet.

This specification, above, gives necessary conditions for the equality of XML literals. The RDF Test Cases [RDF-TESTS] treat these necessary conditions as also sufficient.

Implementations are free to add additional sufficient conditions for equality. If two XML literals compare equal according to an implementation then they must compare equal according to this definition, but not conversely. In particular, XML comments may be treated as significant, and namespaces that are in scope but not visibly utilized (as defined by [XC14N]) may be treated as significant.

[[[ Is there a need for a longer non-normative appendix on implemenation issues for XML literals? This could discuss (a) minimal implementations, for which equality is not needed, and where the set of namespaces and namespace prefixes can be fixed in advance (b) the correct and incorrect use of character by character equality for XML literals. Should there be test cases for issue rdfms-xml-literal-namespaces? ]]]

See the following test cases, per [RDF-TESTS]:

[[[Subject to WG disposition of test cases]]]

3.3 Nodes

An RDF graph is defined using a set of nodes. Many of the nodes are blank, and some of the nodes are labelled with RDF literals or RDF URI references, i.e. there is a partial labelling function from the set of nodes to the union of the set of RDF literals and RDF URI references.

A tidy set of nodes is one in which no two nodes have equal labels. A tidy set of nodes may have any number of distinct blank nodes.

Two nodes are equal if and only if they are the same node. In particular, two different blank nodes are not equal.

3.4 RDF triples

An RDF triple describes an arc in an RDF graph. It contains three components:

a subject node
an predicate, labelling the arc, which is a URI reference, and
an object node

The set containing the subject and object nodes of a triple is tidy (per definition in section Nodes).

The subject must not be labelled with an RDF literal.

Two RDF triples are equal if and only if their subjects are equal, their predicates are equal, and their objects are equal.

3.5 RDF graph

An RDF graph is a set of RDF triples.

The set of nodes of an RDF graph is the set of nodes that are either subject or object of some triple in the graph.

The set of nodes of an RDF graph is tidy (per definition in section Nodes).

[[[Suggestions of a standard graph theory text which treats digraphs as primary would be welcome.]]]

Note: The definition of an RDF graph diverges from the definition of a directed graph in a standard text such as [[[missing ref]]] in that: (a) all nodes must be in at least one arc; (b) all the arcs are labelled; (c) some of the nodes are labelled; (d) labels on nodes are required to be distinct; (e) some labels are shared between nodes and arcs.

3.6 Graph Equality

Two RDF graphs are equal if and only if they are isomorphic. An RDF graph isomorphism is a directed graph isomorphism that respects the labels on both arcs and nodes.

An RDF Graph isomorphism I between two graphs G and G' is a bijection between the nodes of G and the nodes of G', such that:

I(n) is blank if and only if n is blank,
the label (if any) on I(n) equals the label on n, and
the triple (I(s),p,I(o)) is in G' if and only if the triple (s,p,o) is in G,

for all nodes n, s, o in G and all RDF URI references p.

4. Additional technical considerations

4.1 Character normalization

[[[This subsection normatively depends on CHARMOD, currently a last call working draft. If CHARMOD has not reached the appropriate recommendation status as this document progresses down the recommendation track, this section will be deleted.]]]

For the processing of character data that can be represented in different ways, RDF processors are required to conform to Early Uniform Normalization, as described by Character Model for the World Wide Web 1.0 [CHARMOD].

4.2 Fragment identifiers

How should RDF treat a URI reference with a fragment identifier? Conventional web architecture has that the meaning of a fragment identifier is dependent on the MIME type of a resource that is obtained by dereferencing the URI part. URIs without fragment identifiers are generally presumed to map to some resource for which a Web representation (or several) can be retrieved. But RDF has no concept of a fragment identifier separate from a URI: RDF treats a URI reference as an opaque identifier that denotes some resource [RDF-SEMANTICS]. Further, an RDF resource identifier may denote something that is not web-retrievable; e.g. a car, or a Unicorn.

This provides a handling of URI referencess and their denotation that is consistent with the RDF model theory and usage, and also with conventional web axioms. This approach somewhat extends the idea of a "fragment" or "view" beyond the common idea (when handling web documents) that it is a physical part of a containing document.

In view of this, it is reasonable to consider that URIs without fragment identifiers are most helpfully used for indicating web-retrievable resources (when used in RDF), and URIs with fragment identifiers are used for abstract ideas that don't have a direct web representation. This is not a hard-and-fast distinction, as the line between resources having or not having a web-retrievable representation is sometimes hard to draw precisely.

4.3 Forming a URI reference from a QName

The RDF/XML syntax uses QName syntax [XML-NS], section 3, to identify various resources, notably RDF properties. But the RDF graph syntax contains only URI references, and does not recognize QName forms.

Mostly, QNames are handled by the mapping between RDF/XML documents and RDF graph syntax. But there are some occasions where an RDF writer needs to know the correspondence between QNames and URI references (e.g. when using a typed node production). The mapping is described in [RDF-SYNTAX], sections 3.1.2 or 3.1.4.

5. Acknowledgments

This document is a product of extended deliberations by the RDFcore working group, whose members have included:

This specification also draws upon an earlier RDF Model and Syntax document edited by Ora Lassilla and Ralph Swick, and RDF Schema edited by Dan Brickley and R. V. Guha. RDF and RDF Schema Working group members who contributed to this earlier work are:

6. References

6.1 Normative References

6.2 Informational References

Appendix Y: Change log

[[[For reviewers' reference. This appendix will be removed on final publication.]]]

Resource Description Framework (RDF):Concepts and Abstract Data Model

6.1 Normative References

6.2 Informational References

Resource Description Framework (RDF):
Concepts and Abstract Data Model