Copyright © 2003 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark, document use and software licensing rules apply.
The Resource Description Framework (RDF) is a framework for representing information in the Web.
RDF Concepts and Abstract Syntax defines an abstract syntax on which RDF is based, and which serves to link its concrete syntax to its formal semantics. It also includes discussion of design goals, key concepts, datatyping, character normalization and handling of URI references.
This version is a mess at the moment.
This version is modified from the last call version with changes as listed in the change log. These are substantive changes as approved by the RDF Core WG, and editorial changes made, typically in response to last call comments, at the discretion of the editors.
The Resource Description Framework (RDF) is a framework for representing information in the Web.
This document defines an abstract syntax on which RDF is based, and which serves to link its concrete syntax to its formal semantics. It also includes discussion of design goals, key concepts, datatyping, character normalization and handling of URI references.
Normative documentation of the RDF core falls into the following areas:
Within this document, normative sections are explicitly labelled as such. Explicit notes are informative.
The framework is designed so that vocabularies can be layered on top of a core. The RDF core and RDF vocabulary definition (RDF schema) languages [RDF-VOCABULARY] are the first such vocabularies. Others (cf. OWL [OWL] and the applications in the primer [RDF-PRIMER]) are in development.
In section 2, the background rationale and design goals are introduced. Key concepts follow in section 3.
RDF's abstract syntax is a graph, which can be serialized using XML (but which is quite distinct from XML's tree-based infoset [XML-INFOSET]). The abstract syntax captures the fundamental structure of RDF, independently of any concrete syntax used for serialization. The formal semantics of RDF are defined in terms of the abstract syntax. XML content of literals is described in section 5, and the abstract syntax is defined in section 6 of this document.
Section 7 discusses the role of fragment identifiers in URI references used with RDF.
RDF has an abstract syntax that reflects a simple graph-based data model, and formal semantics with a rigorously defined notion of entailment providing a basis for well founded deductions in RDF data.
The development of RDF has been motivated by the following uses, among others:
RDF is designed to represent information in a minimally constraining, flexible way. It can be used in isolated applications, where individually designed formats might be more direct and easily understood, but RDF's generality offers greater value from sharing. The value of information thus increases as it becomes accessible to more applications across the entire Internet.
The design of RDF is intended to meet the following goals:
RDF has a simple data model that is easy for applications to process and manipulate. The data model is independent of any specific serialization syntax.
Note: the term "model" used here in "data model" has a completely different sense to its use in the term "model theory". See [RDF-SEMANTICS] for more information about "model theory" as used in the literature of mathematics and logic.
RDF has a formal semantics which provides a dependable basis for reasoning about the meaning of an RDF expression. In particular, it supports rigorously defined notions of entailment which provide a basis for defining reliable rules of inference in RDF data.
The vocabulary is fully extensible, being based on URIs with optional fragment identifiers (URI references, or URIrefs). URI references are used for naming all kinds of things in RDF.
The other kind of value that appears in RDF data is a literal.
RDF has a recommended XML serialization form [RDF-SYNTAX], which can be used to encode the data model for exchange of information among applications.
RDF can use values represented according to XML schema datatypes [XML-SCHEMA2], thus assisting the exchange of information between RDF and other XML applications.
To facilitate operation at Internet scale, RDF is an open-world framework that allows anyone to make statements about any resource.
In general, it is not assumed that complete information about any resource is available. RDF does not prevent anyone from making assertions that are nonsensical or inconsistent with other statements, or the world as people see it. Designers of applications that use RDF should be aware of this and may design their applications to tolerate incomplete or inconsistent sources of information.
RDF uses the following key concepts:
The underlying structure of any expression in RDF is a collection of triples, each consisting of a subject, a predicate and an object. A set of such triples is called an RDF graph (defined more formally in section 6). This can be illustrated by a node and directed-arc diagram, in which each triple is represented as a node-arc-node link (hence the term "graph").
Each triple represents a statement of a relationship between the things denoted by the nodes that it links. Each triple has three parts:
The direction of the arc is significant: it always points toward the object.
The nodes of an RDF graph are its subjects and objects.
The assertion of an RDF triple says that some relationship, indicated by the predicate, holds between the things denoted by subject and object of the triple. The assertion of an RDF graph amounts to asserting all the triples in it, so the meaning of an RDF graph is the conjunction (logical AND) of the statements corresponding to all the triples it contains. A formal account of the meaning of RDF graphs is given in [RDF-SEMANTICS].
A node may be a URI with optional fragment identifier (URI reference, or URIref), a literal, or blank (having no separate form of identification). Properties are URI references. (See [URI], section 4, for a description of URI reference forms, noting that relative URIs are not used in an RDF graph. See also section 6.4.)
A URI reference or literal used as a node identifies what that node represents. A URI reference used as a predicate identifies the relationship between the nodes it connects. A predicate URI reference may also be a node in the graph.
A blank node is a node that is not a URI reference or a literal. In the RDF abstract syntax, a blank node is just a unique node that can be used in one or more RDF statements, and has no globally distinguishing identity.
A convention used by some linear representations of an RDF graph to allow several statements to reference the same unidentified resource is to use a blank node identifier, which is a local identifier that can be distinguished from all URIs and literals. When graphs are merged, their blank nodes must be kept distinct if meaning is to be preserved; this may call for re-allocation of blank node identifiers. Note that such blank node identifiers are not part of the RDF abstract syntax, and the representation of triples containing blank nodes is entirely dependent on the particular concrete syntax used.
Datatypes are used by RDF in the representation of values such as integers, floating point numbers and dates.
RDF uses the datatype abstraction defined by XML Schema Part 2: Datatypes [XML-SCHEMA2], and may be used with any datatype definition that conforms to this abstraction, even if not actually defined in terms of XML Schema.
A datatype mapping is a set of pairs whose first element belongs to the lexical space of the datatype, and the second element belongs to the value space of the datatype:
With one exception, the datatypes used in RDF have a lexical space consisting of a set of strings. The exception is rdf:XMLLiteral, whose lexical space also includes pairs of strings and language identifiers. The value obtained through its datatype mapping may depend on the language identifier.
For example, the datatype mapping for the XML Schema datatype xsd:boolean, where each member of the value space (represented here as 'T' and 'F') has two lexical representations, is as follows:
Value Space | {T, F} |
---|---|
Lexical Space | {"0", "1", "true", "false"} |
Datatype Mapping | {<"true", T>, <"1", T>, <"0", F>, <"false", F>} |
RDF predefines just one datatype rdf:XMLLiteral, used for embedding XML in RDF (see section 5).
There is no built-in concept of numbers or dates or other common values. Rather, RDF defers to datatypes that are defined separately, and identified with URI references.The predefined XML Schema datatypes [XML-SCHEMA2] are expected to be widely used for this purpose.
Certain XML Schema built-in datatypes are not suitable for use within RDF. For example, the QName datatype requires a namespace declaration to be in scope during the mapping, and is not recommended for use in RDF.
RDF provides no mechanism for defining new datatypes. XML Schema Datatypes [XML-SCHEMA2] provides an extensibility framework suitable for defining new datatypes for use in RDF.
Literals are used to identify values such as numbers and dates by means of a lexical representation. Anything represented by a literal could also be represented by a URI, but it is often more convenient or intuitive to use literals.
A literal may be the object of an RDF statement, but not the subject or the arc.
Literals may be plain or typed :
Continuing the example from section 3.3, the typed literals which can be defined using the XML Schema datatype xsd:boolean are:
Typed Literal | Datatype Mapping | Value |
---|---|---|
<xsd:boolean, "true"> | <"true", T> | T |
<xsd:boolean, "1"> | <"1", T> | T |
<xsd:boolean, "false"> | <"false", F> | F |
<xsd:boolean, "0"> | <"0", F> | F |
Some simple facts indicate a relationship between two objects. Such a fact may be represented as an RDF triple in which the predicate names the relationship, and the subject and object denote the two objects. A familiar representation of such a fact might be as a row in a table in a relational database. The table has two columns, corresponding to the subject and the object of the RDF triple. The name of the table corresponds to the predicate of the RDF triple. A further familiar representation may be as a two place predicate in first order logic.
Relational databases permit a table to have an arbitrary number of columns, a row of which expresses information corresponding to a predicate in first order logic with an arbitrary number of places. Such a row, or predicate, has to be decomposed for representation as RDF triples. A simple form of decomposition introduces a new blank node, corresponding to the row, and a new triple is introduced for each cell in the row. The subject of each triple is the new blank node, the predicate corresponds to the column name, and object corresponds to the value in the cell. The new blank node may also have an rdf:type property whose value corresponds to the table name.
As an example, consider Figure 5 from the [RDF-PRIMER]:
This information might correspond to a row in a table "STAFFADDRESSES", with a primary key STAFFID, and additional columns STREET, STATE, CITY and ZIP.
Thus, a more complex fact is expressed in RDF using a conjunction (logical-AND) of simple binary relationships. RDF does not provide means to express negation (NOT) or disjunction (OR).
Through its use of extensible URI-based vocabularies, RDF provides for expression of facts about arbitrary subjects; i.e. assertions of named properties about specific named things. A URI can be constructed for any thing that can be named, so RDF facts can be about any such things.
The ideas on meaning and inference in RDF are underpinned by the formal concept of entailment, as discussed in the RDF semantics document [RDF-SEMANTICS]. In brief, an RDF expression A is said to entail another RDF expression B if every possible arrangement of things in the world that makes A true also makes B true. On this basis, if the truth of A is presumed or demonstrated then the truth of B can be inferred .
RDF uses URI references to identify resources and properties. Certain URI references are reserved for use by RDF, and should not be used in ways not supported by the RDF specifications. Specifically, URI references with the following leading substrings are reserved for definition by the RDF specifications:
Used with the RDF/XML serialization, these URI prefix strings correspond to XML namespace names [XML-NS] associated with the RDF core vocabulary terms.
Note: these namespace names are the same as those used in earlier RDF documents [RDF-MS] [RDF-SCHEMA].
Vocabulary terms in the rdf: namespace are listed in section 5.1 of the RDF syntax specification [RDF-SYNTAX]. Some of these terms are defined by the RDF specifications to denote specific concepts. Others have purely syntactic purpose (e.g. rdf:ID is part of the RDF/XML syntax) and should not be used in RDF to denote any kind of resource.
Vocabulary terms defined in the rdfs: namespace are defined in the RDF schema vocabulary specification [RDF-VOCABULARY].
RDF provides for XML content as a possible literal value. This typically originates from the use of rdf:parseType="Literal" in the RDF/XML Syntax [RDF-SYNTAX].
Such content is indicated in an RDF graph using a typed literal whose datatype is a special built-in datatype, rdf:XMLLiteral.
As part of the definition of this datatype, an ancillary definition is used.
The XML document corresponding to a pair ( str, lang ) is formed as follows:
Concatenate the five strings:
Encode the resulting Unicode string in UTF-8 to form the corresponding XML document.
No escaping is applied. The choice of rdf-wrapper is fixed but arbitrary.
The XML document corresponding to a string str is formed as the XML document corresponding to the pair (str, "").
Using this, the datatype rdf:XMLLiteral is defined as follows.
Reminder: All other datatypes have a lexical space being a set of strings, and a mapping which maps strings to values.
Note: Not all values of this datatype are compliant with XML 1.1 [XML 1.1]. If compliance with XML 1.1 is desired, then only those values that are fully normalized according to XML 1.1 should be used.
This section defines the RDF abstract syntax. The RDF abstract syntax is a set of triples, called the RDF graph.
This section also defines equality between RDF graphs. A definition of equality is needed to support the RDF Test Cases [RDF-TESTS] specification.
An RDF triple contains three components:
An RDF triple is conventionally written in the order subject, predicate, object.
The predicate is also known as the property of the triple.
An RDF graph is a set of RDF triples.
The nodes of an RDF graph is the set of subjects and objects of triples in the graph.
Two RDF graphs G and G' are equal if there is a bijection M between the nodes of the two graphs, such that:
With this definition, there are the same number of blank nodes in the two graphs, and M shows how each blank node in G can be replaced with a new blank node to give G'.
A URI reference within an RDF graph (an RDF URI reference) is a Unicode string [UNICODE] that:
The encoding consists of:
The disallowed octets that must be %-escaped include all those that do not correspond to US-ASCII characters, and the excluded characters listed in Section 2.4 of [URI], except for the number sign (#), percent sign (%), and the square bracket characters re-allowed in [RFC-2732].
Disallowed octets must be escaped with the URI escaping mechanism (that is, converted to %HH, where HH is the 2-digit hexadecimal numeral corresponding to the octet value).
Two RDF URI references are equal if and only if they compare as equal, character by character, as Unicode strings.
Editors' Note: This section is in the scope of the TAG issue IRIEverywhere-27. The editors are expecting a resolution of this issue during the last call period. This may result in updates to this section.
Note: RDF URI references are compatible with the anyURI datatype as defined by XML schema datatypes [XML-SCHEMA2], constrained to be an absolute rather than a relative URI reference, and constrained to be in Unicode Normal Form C [NFC] (for compatibility with [CHARMOD]).
Note: RDF URI references are compatible with International Resource Identifiers as defined by [XML Namespaces 1.1].
Note: The restriction to absolute URI references is found in this abstract syntax. When there is a well-defined base URI, concrete syntaxes, such as RDF/XML, may permit relative URIs as a shorthand for such absolute URI references.
A literal in an RDF graph contains three components called:
The lexical form is present in all RDF literals; the language identifier and the datatype URI may be absent from an RDF literal.
A plain literal is one in which the datatype URI is absent.
A typed literal is one in which the datatype URI is present.
Note: Literals in which the lexical form begins with a composing character (as defined by [CHARMOD]) are allowed however they may cause interoperability problems, particularly with XML version 1.1 [XML 1.1].
Note: When using the language identifier, care must be taken not to confuse language with locale. The language identifier only relates to human language text. Presentational issues, how to best represent typed data to the end-user, should be addressed in end-user applications.
Two literals are equal if and only if all of the following hold:
Note: RDF Literals are distinct and distinguishable from RDF URI references; e.g. http://example.org as an RDF Literal (untyped, without a language identifier) is not equal to http://example.org as an RDF URI reference.
The datatype URI refers to a datatype. For XML Schema built-in datatypes, URIs such as http://www.w3.org/2001/XMLSchema#int are used. The URI of the datatype rdf:XMLLiteral may be used. There may be other, implementation dependent, mechanisms by which URIs refer to datatypes.
The value associated with a typed literal is found by applying the datatype mapping associated with the datatype URI to the lexical form. Exceptionally, if the datatype is rdf:XMLLiteral and the literal has a language identifier, then the datatype mapping is applied to the pair form by the lexical form and the language identifier.
If the lexical form is not in the lexical space of the datatype associated with the datatype URI, then no literal value can be associated with the typed literal. Such a case, while in error, is not syntacticly ill-formed.
A typed literal for which the datatype does not map the lexical form to a value is not syntacticly ill-formed.
Note: In application contexts, comparing the values of typed literals (see section 6.5.2) is usually more helpful than comparing their syntactic forms (see section 6.5.1). Similarly, for comparing RDF Graphs, semantic notions of entailment (see [RDF-SEMANTICS]) are usually more helpful than syntactic equality (see section 6.3).
The blank nodes in an RDF graph are drawn from an infinite set. This set of blank nodes, the set of all RDF URI references and the set of all literals are pairwise disjoint.
Otherwise, this set is arbitrary.
RDF makes no reference to any internal structure of blank nodes.
RDF uses an RDF URI Reference, which may include a fragment identifier, as a context free identifier for a resource. RFC 2396 [URI] states that the meaning of a fragment identifier depends on the MIME content-type of a document, i.e. is context dependent.
These apparently conflicting views are reconciled by considering that a URI reference in an RDF graph is treated with respect to the MIME type application/rdf+xml [RDF-MIME-TYPE]. Given an RDF URI reference consisting of an absolute URI and a fragment identifier, the fragment identifer identifies the same thing that it does in an application/rdf+xml representation of the resource identified by the absolute URI component. Thus:
This provides a handling of URI references and their denotation that is consistent with the RDF model theory and usage, and also with conventional Web behavior. Note that nothing here requires that an RDF application be able to retrieve any representation of resources identified by the URIs in an RDF graph.
This document contains a significant contribution from Pat Hayes, Sergey Melnik and Patrick Stickler, under whose leadership was developed the framework described in the RDF family of specifications for representing datatyped values, such as integers and dates.
The editors acknowledge valuable contributions from the following: Frank Manola, Pat Hayes, Dan Brickley, Jos de Roo, Dave Beckett, Patrick Stickler, Peter F. Patel-Schneider, Jerome Euzenat, Massimo Marchiori, Tim Berners-Lee, Dave Reynolds and Dan Connolly.
Jeremy Carroll thanks Oreste Signore, his host at the W3C Office in Italy and Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo", part of the Consiglio Nazionale delle Ricerche, where Jeremy is a visiting researcher.
This document is a product of extended deliberations by the RDFcore Working Group, whose members have included: Art Barstow (W3C), Dave Beckett (ILRT), Dan Brickley (ILRT), Dan Connolly (W3C), Jeremy Carroll (Hewlett Packard), Ron Daniel (Interwoven Inc), Bill dehOra (InterX), Jos De Roo (AGFA), Jan Grant (ILRT), Graham Klyne (Nine by Nine), Frank Manola (MITRE Corporation), Brian McBride (Hewlett Packard), Eric Miller (W3C), Stephen Petschulat (IBM), Patrick Stickler (Nokia), Aaron Swartz (HWG), Mike Dean (BBN Technologies / Verizon), R. V. Guha (Alpiri Inc), Pat Hayes (IHMC), Sergey Melnik (Stanford University) and Martyn Horner (Profium Ltd).
This specification also draws upon an earlier RDF Model and Syntax document edited by Ora Lassilla and Ralph Swick, and RDF Schema edited by Dan Brickley and R. V. Guha. RDF and RDF Schema Working Group members who contributed to this earlier work are: Nick Arnett (Verity), Tim Berners-Lee (W3C), Tim Bray (Textuality), Dan Brickley (ILRT / University of Bristol), Walter Chang (Adobe), Sailesh Chutani (Oracle), Dan Connolly (W3C), Ron Daniel (DATAFUSION), Charles Frankston (Microsoft), Patrick Gannon (CommerceNet), R. V. Guha (Epinions, previously of Netscape Communications), Tom Hill (Apple Computer), Arthur van Hoff (Marimba), Renato Iannella (DSTC), Sandeep Jain (Oracle), Kevin Jones, (InterMind), Emiko Kezuka (Digital Vision Laboratories), Joe Lapp (webMethods Inc.), Ora Lassila (Nokia Research Center), Andrew Layman (Microsoft), Ralph LeVan (OCLC), John McCarthy (Lawrence Berkeley National Laboratory), Chris McConnell (Microsoft), Murray Maloney (Grif), Michael Mealling (Network Solutions), Norbert Mikula (DataChannel), Eric Miller (OCLC), Jim Miller (W3C, emeritus), Frank Olken (Lawrence Berkeley National Laboratory), Jean Paoli (Microsoft), Sri Raghavan (Digital/Compaq), Lisa Rein (webMethods Inc.), Paul Resnick (University of Michigan), Bill Roberts (KnowledgeCite), Tsuyoshi Sakata (Digital Vision Laboratories), Bob Schloss (IBM), Leon Shklar (Pencom Web Works), David Singer (IBM), Wei (William) Song (SISU), Neel Sundaresan (IBM), Ralph Swick (W3C), Naohiko Uramoto (IBM), Charles Wicksteed (Reuters Ltd.), Misha Wolf (Reuters Ltd.) and Lauren Wood (SoftQuad).
We divide these into substantive and editorial. The substantive changes also list consquential editorials changes. Editorial changes are those which do not result in any change in the meaning of an RDF document or the behaviour of an RDF application.