Copyright ©2002 W3C® (MIT, INRIA, Keio), All Rights Reserved. W3C liability, trademark, document use and software licensing rules apply.
The Resource Description Framework (RDF) is a framework for representing information in the Web.
This document defines an abstract syntax on which RDF is based, and which serves to link its concrete syntax to its formal semantics. It also includes discussion of design goals, meaning of RDF documents, key concepts, datatyping, character normalization and handling of URI references.
This is an editors' draft despite anything else said here.
This is a W3C RDF Core Working Group Last Call Working Draft produced as part of the W3C Semantic Web Activity (Activity Statement).
This document is in the Last Call review period, which ends on 31 January 2003. This document has been endorsed by the RDF Core Working Group.
This document is being released for review by W3C Members and other interested parties to encourage feedback and comments.
This is a public W3C Working Draft and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use W3C Working Drafts as reference material or to cite as other than "work in progress". A list of current W3C Recommendations and other technical documents can be found at http://www.w3.org/TR/.
In conformance with W3C policy requirements, known patent and IPR constraints associated with this Working Draft are detailed on the RDF Core Working Group Patent Disclosure page.
Comments on this document are invited and should be sent to the public mailing list www-rdf-comments@w3.org. An archive of comments is available at http://lists.w3.org/Archives/Public/www-rdf-comments/.
The Resource Description Framework (RDF) is a framework for representing information in the Web.
This document defines an abstract syntax on which RDF is based, and which serves to link its concrete syntax to its formal semantics. It also includes discussion of design goals, meaning of RDF documents, key concepts, datatyping, character normalization and handling of URI references.
Normative documentation of the RDF core falls into the following areas:
Within this document normative sections are explicitly labelled as such. Explicit notes are informative.
The framework is designed so that vocabularies can be layered on top of a core. The RDF core and RDF vocabulary definition (RDF schema) languages [RDF-VOCABULARY] are the first such vocabularies. Others (cf. OWL [OWL] and the applications in the primer [RDF-PRIMER]) are in development.
In section 2, the background rationale, design goals and fundamental concepts and introduced, followed (section 2.4) by an assertion that publication of an RDF document carries certain implications, and discussion of those implications.
RDF's abstract syntax is a graph, which can be serialized using XML (but which is quite distinct from XML's tree-based infoset [XML-INFOSET]). The abstract syntax captures the fundamental structure of RDF, independently of any concrete syntax used for serialization. The formal semantics of RDF are defined in terms of the abstract syntax. XML content of literals is described in section 3, and the abstract syntax is defined in section 4 of this document.
Section 5 discusses the role of fragment identifiers in URI references used with RDF.
RDF has an abstract syntax that reflects a simple graph-based data model, and formal semantics with a rigorously defined notion of entailment providing a basis for well founded deductions in RDF data.
The development of RDF has been motivated by the following uses, among others:
RDF is designed to represent information in a minimally constraining, flexible way. It can be used in isolated applications, where individually designed formats might be more direct and easily understood, but RDF's generality offers greater value from sharing. The value of information thus increases as it becomes accessible to more applications across the entire Internet.
The design of RDF is intended to meet the following goals:
RDF has a simple data model that is easy for applications to process and manipulate. The data model is independent of any specific serialization syntax.
Note: the term "model" used here in "data model" has a completely different sense to its use in the term "model theory". See the RDF model theory specification [RDF-SEMANTICS] for more information about "model theory" as used in the literature of mathematics and logic.
RDF has a formal semantics which provides a dependable basis for reasoning about the meaning of an RDF expression. In particular, it supports rigorously defined notions of entailment which provide a basis for defining reliable rules of inference in RDF data.
The vocabulary is fully extensible, being based on URIs with optional fragment identifiers (URI references, or URIrefs). URI references are used for naming all kinds of things in RDF.
The other kind of value that appears in RDF data is a literal.
RDF has a recommended XML serialization form [RDF-SYNTAX], which can be used to encode the data model for exchange of information among applications.
RDF can use values represented according to XML schema datatypes [XML-SCHEMA2], thus assisting the exchange of information between RDF and other XML applications.
To facilitate operation at Internet scale, RDF is an open-world framework that allows anyone to make simple assertions about anything. In general, it is not assumed that all information about any topic is available. A consequence of this is that RDF cannot prevent anyone from making assertions that are nonsensical or inconsistent with the world as people see it, and applications that build upon RDF need to find ways to deal with incomplete and conflicting sources of information. (This is where RDF departs from more prescriptive approaches to representing data in XML, which aim to present information that is well-formed and complete for an application's needs.)
RDF can represent arbitrary information that can be expressed as simple facts. (What constitutes a simple fact is discussed later, in section 2.3.5)
RDF is intended to convey assertions that are meaningful to the extent that they may, in appropriate contexts, be used to express the terms of binding agreements.
This goal is explored further in section 2.4 below.
RDF uses the following key concepts:
The underlying structure of any expression in RDF can be viewed as a directed labelled graph, which consists of nodes and labelled directed arcs that link pairs of nodes (these notions are defined more formally in section 4). The RDF graph is a set of triples:
Each property arc represents a statement of a relationship between the nodes that it links, having three parts:
The direction of the arc is significant: it always points toward the object of a statement.
The meaning of an RDF graph is the conjunction (i.e. logical AND) of all the statements that it contains.
Nodes in an RDF graph are URIs with optional fragment identifiers (URI references, or URIrefs), literals, or blank (having no separate form of identification). Arcs are labelled with URI references. (See [URI], section 4, for a description of URI reference forms, noting that relative URIs are not used in an RDF graph. See also section 4.4.)
The URI reference or literal used as a node identifies what that node represents. The label on an arc identifies the relationship between the nodes connected by the arc. The arc label may also be a node in the graph.
A blank node is an RDF graph node that is not a URI reference or a literal. In the RDF abstract syntax, a blank node is just a unique node that can be used in one or more RDF statements, and has no globally distinguishing identity.
A convention used by some linear representations of an RDF graph to allow several statements to reference the same blank node is to use a blank node identifier, which is a local identifier that can be distinguished from all URIs and literals. When graphs are merged, their blank nodes must be kept distinct if meaning is to be preserved; this may call for re-allocation of blank node identifiers.
Note that blank node identifiers are not part of the RDF abstract syntax, and the representation of statements that use blank nodes is entirely dependent on the particular concrete syntax used.
Datatypes are used by RDF in the representation of values such as integers, floating point numbers and dates.
RDF uses the datatype abstraction defined by XML Schema Part 2: Datatypes [XML-SCHEMA2]. A datatype consists of a lexical space, a value space and a datatype mapping.
A datatype mapping is a set of pairs whose first element belongs to the lexical space of the datatype, and the second element belongs to the value space of the datatype:
With one exception, the datatypes used in RDF have a lexical space consisting of a set of strings. The exception is rdf:XMLLiteral, whose lexical space also includes pairs of strings and language identifiers. The value obtained through its datatype mapping may depend on the language identifier.
For example, the datatype mapping for the XML Schema datatype xsd:boolean, where each member of the value space (represented here as 'T' and 'F') has two lexical representations, is as follows:
Value Space | {T, F} |
---|---|
Lexical Space | {"0", "1", "true", "false"} |
Datatype Mapping | {<"true", T>, <"1", T>, <"0", F>, <"false", F>} |
RDF predefines just one datatype rdf:XMLLiteral, used for embedding XML in RDF (see section 3).
There is no built-in concept of numbers or dates or other common values. Rather, RDF defers to datatypes that are defined separately, and identified with URI references.The predefined XML Schema datatypes [XML-SCHEMA2] are expected to be widely used for this purpose.
Certain XML Schema built-in datatypes are not suitable for use within RDF. For example, the QName datatype requires a namespace declaration to be in scope during the mapping, and is not recommended for use in RDF.
The defining authority of a URI which identifies a datatype is responsible for specifying the datatype's lexical space, value space and datatype mapping.
RDF provides no mechanism for defining new datatypes. XML Schema Datatypes [XML-SCHEMA2] provides an extensibility framework suitable for defining new datatypes for use in RDF.
Literals are used to identify values such as numbers and dates by means of a lexical representation. Anything represented by a literal could also be represented by a URI, but it is often more convenient or intuitive to use literals.
A literal may be the object of an RDF statement, but not the subject or the arc.
Literals may be plain or typed :
Continuing the example from section 2.3.3, the typed literals which can be defined using the XML Schema datatype xsd:boolean are:
Typed Literal | Datatype Mapping | Value |
---|---|---|
<xsd:boolean, "true"> | <"true", T> | T |
<xsd:boolean, "1"> | <"1", T> | T |
<xsd:boolean, "false"> | <"false", F> | F |
<xsd:boolean, "0"> | <"0", F> | F |
Some simple facts indicate a relationship between two objects. Such a fact may be represented as an RDF triple in which the predicate names the relationship, and the subject and object denote the two objects. A familiar representation of such a fact might be as a row in a table in a relational database. The table has two columns, corresponding to the subject and the object of the RDF triple. The name of the table corresponds to the predicate of the RDF triple. A further familiar representation may be as a two place predicate in first order logic.
Relational databases permit a table to have an arbitrary number of columns, a row of which expresses information corresponding to a predicate in first order logic with an arbitrary number of places. Such a row, or predicate, has to be decomposed for representation as RDF triples. A simple form of decomposition introduces a new blank node, corresponding to the row, and a new triple is introduced for each cell in the row. The subject of each triple is the new blank node, the predicate corresponds to the column name, and object corresponds to the value in the cell. The new blank node may also have an rdf:type property whose value corresponds to the table name.
As an example, consider Figure 5 from the [RDF-PRIMER]:
This information might correspond to a row in a table "STAFFADDRESSES", with a primary key STAFFID, and additional columns STREET, STATE, CITY and ZIP.
Thus, a more complex fact is expressed in RDF using a conjunction (logical-AND) of simple binary relationships. RDF does not provide means to express negation (NOT) or disjunction (OR). The expressive power of RDF corresponds to the existential-conjunctive (EC) subset of first order logic [Sowa].
Through its use of extensible URI-based vocabularies, RDF provides for expression of facts about arbitrary subjects; i.e. assertions of named properties about specific named things. A URI can be constructed for any thing that can be named, so RDF facts can be about any such things.
The ideas on meaning and inference in RDF are underpinned by the formal concept of entailment, as discussed in the RDF semantics document [RDF-SEMANTICS]. In brief, an RDF expression A is said to entail another RDF expression B if every possible arrangement of things in the world that makes A true also makes B true. On this basis, if the truth of A is presumed or demonstrated then the truth of B can be inferred .
RDF uses URIs to identify resources and properties. Certain URIs are reserved for use by RDF, and may not be used for any purpose not sanctioned the RDF specifications. Specifically, URIs with the following leading substrings are reserved for RDF core vocabulary:
Used with the RDF/XML serialization, these URI prefix strings correspond to XML namespaces [XML-NS] associated with the RDF core vocabulary terms.
Note: these namespace URIs are the same as those used in earlier RDF documents [RDF-MS] [RDF-SCHEMA].
[[[NOTE FOR REVIEWERS: Some terms in these namespaces have been deprecated, some have been added, and some RDF schema terms have had their meaning changed. We invite community feedback regarding the relative costs of adopting these changes under the old namespace URIs vs creating new URIs for this revision of RDF.]]]
Vocabulary terms in the rdf: namespace are listed in section 5.1 of the RDF syntax specification [RDF-SYNTAX].
Vocabulary terms defined in the rdfs: namespace are defined in the RDF schema vocabulary specification [RDF-VOCABULARY].
There are two aspects to the meaning of an RDF graph. There is the formal meaning as determined by the RDF semantics [RDF-SEMANTICS]. This determines, with mathematical precision, the conclusions that can logically be drawn from an RDF graph. There is also the social meaning of the graph. It is the social meaning that affects what it means to people and how it interacts with human social institutions such as our systems of law.
RDF/XML expressions, i.e. encodings of RDF graphs, can be used to make claims or assertions about the 'real' world. Such expressions are said to be asserted.
Not every RDF/XML expression is asserted. Some may convey meaning that is partly determined by the circumstances in which they are used. For example, in English, a statement "I don't believe that George is a clown" contains the words "George is a clown", which, considered in isolation, has the form of an assertion that George exhibits certain comic qualities. However, considering the whole sentence, no such assertion is considered to be made.
When an RDF graph is asserted in the Web, its publisher is saying something about their view of the world. Such an assertion should be understood to carry the same social import and responsibilities as an assertion in any other format. A combination of social (e.g. legal) and technical machinery (protocols, file formats, publication frameworks) provide the contexts that fix the intended meanings of the vocabulary of some piece of RDF, and which distinguish assertions from other uses (e.g. citations, denials or illustrations).
The technical machinery includes protocols for transferring information (e.g. HTTP, SMTP) and file formats for encapsulating and labelling information (e.g. MIME, XML). A media type, application/rdf+xml [RDF-MIME-TYPE] indicates the use of RDF/XML as distinct from some other XML that happens to look like RDF. Issuing an HTTP GET request and obtaining data with a "200 OK" response code is a technical indication that the received data was published at the request URI; but data received with a "404 Not found" response cannot be considered to be similarly published information.
The social machinery includes the form of publication: publishing some unqualified statements on one's World Wide Web home page would generally be taken as an assertion of those statements. But publishing the same statements with a qualification, such as "here are some common myths", or as part of a rebuttal, would likely not be construed as an assertion of the truth of those statements. Similar considerations apply to the publication of assertions expressed in RDF.
An RDF graph may contain "defining information" that is opaque to logical reasoners. This information may be used by human interpreters of RDF information, or programmers writing software to perform specialized forms of deduction in the Semantic Web.
The social conventions surrounding use of RDF assume that any RDF URI reference gains its meaning from some defining individual, organization or context. This applies most notably to RDF predicate URI references.
These social conventions are rooted in the URI specification [URI] and registration procedures [URI-REG]. A URI scheme registration refers to a specification of the detailed syntax and interpretation for that scheme, from which the defining authority for a given URI may be deduced. In the case of http: URIs, the defining specification is the HTTP protocol specification [HTTP], which specifies how to use the HTTP protocol to obtain a resource representation from the host named in the URI; thus, the owner of the indicated DNS domain controls (observable aspects of) the URI's meaning.
Thus, the choice of terms used in published RDF is significant in determining its meaning, through reference to definitions asserted by the defining authorities for those terms.
However, even when a URI reference can be dereferenced as an RDF/XML document, it's use within an asserted RDF graph does not implicitly assert the contents of the referenced document.
Human publishers of RDF content commit themselves to the mechanically-inferred social obligations.
The meaning of an RDF document includes the social meaning, the formal meaning, and the social meaning of the formal entailments. The assertion of an RDF graph G, when G logically entails G', includes the implicit assertion of G'. The implied assertion of G' should be interpreted using the same social conventions that are reasonably used to interpret the assertion of G.
Imagine two websites publishing the following RDF:
(A) http://insult.example.com/lexicon# asserts the following, and this is all that one can find on the website about that term: |
||
A:Clown | rdf:type | rdfs:Class . |
A:Clown | rdfs:comment | "A class of foolish people, whose pronouncements are probably ill-considered and not to be taken seriously" . |
(B) http://AngloSaxon.example.org/lexicon# asserts: |
||
B:Comic | rdf:subClassOf | <http://insult.example.com/lexicon#Clown> . |
Imagine also a third, using the vocabulary previously defined by the first two.
|
||
C:JohnSmith | rdf:type | <http://AngloSaxon.example.org/lexicon#Comic> . |
Now, it follows by the formal RDF model theory that these three together entail:
C:JohnSmith | rdf:type | A:Clown . |
<A:Clown> | rdfs:comment | "A class of foolish people, whose pronouncements are probably ill-considered and not to be taken seriously" . |
Given this formal entailment, the social context of rdfs:comment is understood by referring to the [RDF-VOCABULARY] which says it provides: "a human-readable description of a resource". Thus, the person identified as C:JohnSmith might reasonably consider himself to be insulted.
Moreover, since the publishers of the third Web site http://skunk.example.org/ link C:JohnSmith to the vocabulary previously defined to be insulting, it is they who have insulted C:JohnSmith.
RDF provides for XML content as a possible literal value. This typically originates from the use of rdf:parseType="Literal" in the RDF/XML Syntax [RDF-SYNTAX].
Such content is indicated in an RDF graph using a typed literal whose datatype is a special built-in datatype, rdf:XMLLiteral.
As part of the definition of this datatype, an ancillary definition is used.
The XML document corresponding to a pair ( str, lang ) is formed as follows:
Concatenate the five strings:
Encode the resulting Unicode string in UTF-8 to form the corresponding XML document.
No escaping is applied. The choice of rdf-wrapper is fixed but arbitrary.
The XML document corresponding to a string str is formed as the XML document corresponding to the pair (str, "").
Using this, the datatype rdf:XMLLiteral is defined as follows.
Reminder: All other datatypes have a lexical space being a set of strings, and a mapping which maps strings to values.
Note: Not all values of this datatype are compliant with XML 1.1 [XML 1.1]. If compliance with XML 1.1 is desired, then only those values that are fully normalized according to XML 1.1 should be used.
This section defines the RDF abstract syntax. The RDF abstract syntax is a set of triples, called the RDF graph.
This section also defines equality between RDF graphs. A definition of equality is needed to support the RDF Test Cases [RDF-TESTS] specification.
An RDF triple contains three components, called:
The subject may not be an RDF literal.
Note: subjects and objects are otherwise unrestricted, since anything that is neither an RDF literal nor an RDF URI reference. is treated as a blank node.
An RDF triple is conventionally written in the order subject, predicate, object.
The predicate is also known as the property of the triple.
An RDF graph is a set of RDF triples.
The nodes of an RDF graph is the set of subjects and objects of triples in the graph.
The blank nodes of an RDF graph are those nodes that are not RDF literals or RDF URI references.
Two RDF graphs G and G' are equal if there is a bijection M between the nodes of the two graphs, such that:
With this definition, there are the same number of blank nodes in the two graphs, and M shows how each blank node in G can be replaced with a new blank node to give G'.
A URI reference within an RDF graph (an RDF URI reference) is a Unicode string [UNICODE] that:
The disallowed characters that must be %-escaped include all non-ASCII characters, the excluded characters listed in Section 2.4 of [URI], except for the number sign (#) and percent sign (%) characters and the square bracket characters re-allowed in [RFC-2732].
Disallowed characters must be escaped as follows:
Two RDF URI references are equal if and only if they compare as equal, character by character, as Unicode strings.
Editors' Note: This section is in the scope of the TAG issue IRIEverywhere-27. The editors are expecting a resolution of this issue during the last call period. This may result in updates to this section.
Note: RDF URI references are compatible with the anyURI datatype as defined by XML schema datatypes [XML-SCHEMA2], constrained to be an absolute rather than a relative URI reference, and constrained to be in Unicode Normal Form C [NFC] (for compatibility with [CHARMOD]).
Note: RDF URI references are compatible with International Resource Identifiers as defined by [XML Namespaces 1.1].
Note: The restriction to absolute URI references is found in this abstract syntax. When there is a well-defined base URI, concrete syntaxes, such as RDF/XML, may permit relative URIs as a shorthand for such absolute URI references,
A literal in an RDF graph contains three components called:
The lexical form is present in all RDF literals; the language identifier and the datatype URI may be absent from an RDF literal.
A plain literal is one in which the datatype URI is absent.
A typed literal is one in which the datatype URI is present.
Note: Literals in which the lexical form begins with a composing character (as defined by [CHARMOD]) are allowed however they may cause interoperability problems, particularly with XML version 1.1 [XML 1.1].
Note: When using the language identifier, care must be taken not to confuse language with locale. The language identifier only relates to human language text. Presentational issues, how to best represent typed data to the end-user, should be addressed in end-user applications.
Two literals are equal if and only if all of the following hold:
Note: RDF Literals are distinct and distinguishable from RDF URI references; e.g. http://example.org as an RDF Literal (untyped, without a language identifier) is not equal to http://example.org as an RDF URI reference.
The datatype URI refers to a datatype. For XML Schema built-in datatypes, URIs such as http://www.w3.org/2001/XMLSchema#int are used. The URI of the datatype rdf:XMLLiteral may be used. There may be other, implementation dependent, mechanisms by which URIs refer to datatypes.
The value associated with a typed literal is found by applying the datatype mapping associated with the datatype URI to the lexical form. Exceptionally, if the datatype is rdf:XMLLiteral and the literal has a language identifier, then the datatype mapping is applied to the pair form by the lexical form and the language identifier.
If the lexical form is not in the lexical space of the datatype associated with the datatype URI, then no literal value can be associated with the typed literal. Such a case, while in error, is not syntacticly ill-formed.
A typed literal for which the datatype does not map the lexical form to a value is not syntacticly ill-formed.
Note: In application contexts, comparing the values of typed literals (see section 4.5.2) is usually more helpful than comparing their syntactic forms (see section 4.5.1). Similarly, for comparing RDF Graphs, semantic notions of entailment (see [RDF-SEMANTICS]) are usually more helpful than syntactic equality (see section 4.3).
The blank nodes in an RDF graph are drawn from an infinite set. This set of blank nodes, the set of all RDF URI references and the set of all literals are pairwise disjoint.
Otherwise, this set is arbitrary.
RDF makes no reference to any internal structure of blank nodes.
RDF uses an RDF URI Reference , which may include a fragment identifier, as a context free identifier for a resource. RFC 2396 [URI] states that the meaning of a fragment identifier depends on the MIME content-type of a document, i.e. is context dependent.
These apparently conflicting views are reconciled by considering that, in an RDF graph, any RDF URI reference consisting of an absolute URI and a fragment identifier identifies the same thing as the fragment identifier does in an application/rdf+xml [RDF-MIME-TYPE] representation of the resource identified by the absolute URI component. Thus:
This provides a handling of URI references and their denotation that is consistent with the RDF model theory and usage, and also with conventional Web behavior.
This document contains a significant contribution from Pat Hayes, Sergey Melnik and Patrick Stickler, under whose leadership was developed the framework described in the RDF family of specifications for representing datatyped values, such as integers and dates.
The editors acknowledge valuable contributions from the following:
Jeremy Carroll thanks Oreste Signore, his host at the W3C Office in Italy and Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo", part of the Consiglio Nazionale delle Ricerche, where Jeremy is a visiting researcher.
This document is a product of extended deliberations by the RDFcore Working Group, whose members have included:
This specification also draws upon an earlier RDF Model and Syntax document edited by Ora Lassilla and Ralph Swick, and RDF Schema edited by Dan Brickley and R. V. Guha. RDF and RDF Schema Working Group members who contributed to this earlier work are:
This section to be removed before last call.
Below using old numbering