Copyright ©2002 W3C® (MIT, INRIA, Keio), All Rights Reserved. W3C liability, trademark, document use and software licensing rules apply.
The Resource Description Framework (RDF) is a data format for representing metadata about Web resources, and other information. This document describes the abstract graph syntax on which RDF is based, and which serves to link its XML serialization to its formal semantics. It also describes some other technical aspects of RDF that do not fall under the topics of formal semantics, XML serialization syntax or RDF schema and vocabulary definitions (which are eacyh covered by a separate document in this series).
This is a W3C RDF Core Working Group Working Draft produced as part of the W3C Semantic Web Activity (Activity Statement).
This document is being released for review by W3C Members and other interested parties to encourage feedback and comments, especially with regard to how the changes affect existing implementations and content.
This is a public W3C Working Draft and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use W3C Working Drafts as reference material or to cite as other than "work in progress". A list of current W3C Recommendations and other technical documents can be found at http://www.w3.org/TR/.
Comments on this document are invited and should be sent to the public mailing list www-rdf-comments@w3.org. An archive of comments is available at http://lists.w3.org/Archives/Public/www-rdf-comments/.
The Resource Description Framework (RDF) is a data format for representing metadata about Web resources, and other information. The normative documentation of RDF falls broadly into the following areas:
This document addresses the last two of these items. The first three are covered by separate documents ([RDF-SYNTAX], [RDF-SEMANTICS], [RDF-VOCABULARY]).
In section 2, some background to the design goals and rationale of RDF is presented. There is also some discussion of the intended implications of publishing an RDF document (section 2.3).
RDF is based on a graph syntax, which is typically serialized using XML. This graph syntax captures the fundamental structure of RDF, independently of any serialization syntax that may be used. The formal semantics of RDF are defined in terms of the graph syntax. The graph syntax is defined in section 3 of this document
Section 4 presents a number of technical issues that don't clearly fall into any of the more explicit areas noted above.
RDF uses well established ideas from various data and knowledge representation communities, with recognizable relationships to Conceptual Graphs, logic-based knowedge representation, frames, and relational databases [Sowa,CG,KIF,Hayes,Luger,Gray].
RDF builds on XML, which provides a syntactic framework for representing documents and other information. It has a simple graph-based data model and formal semantics with a rigorously defined notion of entailment, which in turn provides a basis for well founded deductions in RDF data.
The real value of RDF comes not so much from any single application, but from the possibilities for sharing data between applications. The value of information thus increases as it becomes accessible to more and more applications across the entire Internet.
The development of RDF has been motivated by the following uses, among others:
The design of RDF is intended to meet the following goals:
RDF has a simple data model that is easy for applications to process and manipulate. The data model is independent of any specific serialization syntax.
NOTE: the term "model" used here in "data model" has a completely different sense to its use in the term "model theory". See the RDF model theory specification [RDF-SEMANTICS] or a textbook on logical semantics (e.g. [HUNTER,DAVIS]) for more information about what logicians call "model theory".
RDF has a formal semantics which provides a sound basis for reasoning about the meaning of an RDF expression. In particular, it supports rigorously defined notions of entailment which provide a basis for defining reliable rules of inference in RDF data.
The vocabulary is fully extensible, being based on URIs with optional fragment identifiers (URIrefs). URIrefs are used for naming all kinds of things in RDF data. The only other kind of label that appears in RDF data is a literal string.
RDF has an XML-based serialization form which, if used appropriately, allows a wide range of "ordinary" XML data to be interpreted as RDF [STRIPEDRDF].
RDF can be used with XML schema datatypes [XML-SCHEMA2], thus assisting the exchange of information between RDF and other XML applications.
[[[Add comment here if goal not fully achieved]]]
To allow operation at Internet scale, RDF is an open-world framework that allows anyone to say anything about anything. In general, it is not assumed that all information about any topic is available. A consequence of this is that RDF cannot prevent anyone from making nonsensical or inconsistent assertions, and applications that build upon RDF must find ways to deal with conflicting sources of information. (This is where RDF departs from the XML approach to data representation, which is generally quite prescriptive and aims to present an application with information that is well-formed and complete for the application's needs.)
Through its use of extensible URI-based vocabularies, RDF aims to provide for universal expression of ground facts; i.e. assertions of specific properties about specific named things.
RDF itself does not provide the machinery of inference, but provides the raw data upon which such machinery can operate. Other work is looking for ways to build more expressive expressions on the basic capabilities of the RDF core language.
RDF is intended to convey assertions that are meaningful to the extent that they may, in appropriate contexts, be used to express the terms of binding agreements.
This goal is explored further in section 2.3 below.
The RDF specification emphasizes the formal structure and meaning of RDF. But there is also a social dimension that is easily overlooked when dealing with such formal aspects.
[[[These words adapted from Primer 7.1]]]
RDF is a language designed to support the Semantic Web, in much the same way that HTML is the language that supports the original Web. The Semantic Web aims for data to be shared and processed by automated tools as well as by people. To serve this purpose, certain meanings of RDF statements must be defined in a very precise manner; this is provided by the RDF Model Theory [RDF-SEMANTICS].
Model-theoretic semantics assumes that a language refers to a 'world', and describes the minimal conditions that a world must satisfy in order to assign an appropriate meaning for every expression in the language. A particular world is called an interpretation, so that model theory might be better called 'interpretation theory'. The idea is to provide an abstract, mathematical account of the properties that any such interpretation must have, making as few assumptions as possible about its actual nature or intrinsic structure. The RDF model theory is couched in the language of set theory simply because that is the normal language of mathematics - for example, the model theory assumes that names denote things in a set IR called the 'universe' - but the use of set- theoretic language is not supposed to imply that the things in the universe are set-theoretic in nature.
The chief utility of such a semantic theory is not to suggest any particular processing model, or to provide any deep analysis of the nature of the things being described by the language (in our case, the nature of resources), but rather to provide a technical tool to analyze the semantic properties of proposed operations on the language; in particular, to provide a way to determine when they preserve meaning.
The RDF model theory treats RDF as a simple assertional language, in which each triple makes a distinct assertion, and the meaning of any triple is not changed by adding other triples. Based on the semantics defined in the model theory, it is simple to translate an RDF graph into a logical expression with essentially the same meaning.
[[[Adapted words from DanBri/PatHayes]]]
RDF/XML documents, i.e. encodings of RDF graphs, can be used to make representations of claims or assertions about the world. RDF graphs may be asserted to be true, and such an assertion should be understood to carry the same social import and responsibilities as an assertion in any other format. A combination of social (e.g. legal) and technical machinery (protocols, file formats, publication frameworks) provide the contexts that fix the intended meanings of the vocabulary of some piece of RDF, and which distinguish assertions from other uses (e.g. citations, denals or illustrations).
A media type, application/rdf+xml has been registered for indicating the use of RDF/XML as an assertional representation in this way (see section 3.7).
To support logical entailments, formal RDF meaning is based on a model theory (see section 2.3.1). The notion of truth here is crucial: a possible world may correspond to some RDF if and only if the RDF statement is true in that world. This leads to consideration of what makes a statement be true:
It is presumed here that any interesting statement about the world or human afairs must ultimately depend on assumed truths. Having accepted such an assumed truth into one's worldview, other interesting truths may be deduced by logical means. Semantic web vocabulary gains currency through use, so also do semantic web deductions ultimately have force through acceptance by people. There is a combination of logical and social (non-logical) dimensions in which semantic web deduction must operate.
The RDF code language provides a way to make simple formal assertions, with very no machinery for formalizing allowable inferences. Inferences are performed by processes, embedded in software implentations, whose validity is not formally demonstrable, and must be assumed or trusted to be valid (in relation to the world and/or human affairs). It is expected that semantic web languages layered on RDF will give formal expression to allowable inference, and to allow provable deductions by generic software modules to replace the individual ad-hoc implemenations.
When an RDF graph is asserted in the web, its publisher is saying something about their view of the world. (The mechanism for deciding whether or not a graph is asserted is not defined here, but it is presumed that the publisher's intent will be clear in some way -- social convention or logical deduction.)
When a user invokes an application, there is also a social and technical context of invocation that determines some set of RDF assertions that will be assumed to be true: the application itself, and any RDF files that are passed to it. Garbage-in, garbage-out applies: if the initial assumed facts are wrong or meaningless, the results will have little value. No specfic mechanisms for deciding or evaluating the validity of any such assertions are defined here.
An assertion tells us something about "the world" and human affairs, through the normal model theoretic possible-world constraint mechanisms. Some of the truths that are asserted may be logical truths that can be evaluated using logical machinery. Others may be assumed truths that cannot be evaluated logically, but can be determined by human interpretation. So when we assert an RDF graph, one is stating a constraint on the real world, saying that both the logically testable and humanly interpretable truths in the graph are indeed true in that world.
In accordance with appropriately sanctioned logical entailment, it is intended that inferences may be used to deduce new RDF statements with the same force of assertion as the explicitly statements from which they are derived.
Noting that there is no single human opinion about the truth of some statements, the graph may further contain commentary for human interpreters to indicate the realm of human interpretation that should be applied. This means a graph may contain "defining information" that is opaque to logical reasoners. This information may be used by human interpreters of RDF informaton, or programmers writing software to perform specialized forms of deduction in the Semantic Web.
RDF uses the following key concepts:
The underlying structure of any RDF expression is a directed labelled graph (or multigraph), which consists of nodes and labelled directed arcs that link pairs of nodes. The formal semantics for RDF is defined in terms of this graph syntax. An RDF expression is sometimes called an RDF graph. The graph can conveniently be represented as a set of triples, where each triple contains two node labels and an arc label:
Each arc corresponds to a startement that asserts a relationship between the nodes that it links. The meaning of an RDF graph is the conjunction (i.e. logical AND) of all the statements that it contains.
Nodes in an RDF graph are labelled with URIs with optional fragment identifiers (URIrefs), literal strings, or nothing at all. Arcs are labelled with URIrefs.
The label on a node indicates what that node is meant to represent. The label on an arc names the relationship that is asserted to hold between the nodes connected by that arc. Some URIrefs may indicate web resources, and a node thus labelled is presumed to denote that resource. Other URIrefs may represent abstract ideas or values rather than a retreivable Web resource. RDF thus leverages the universal naming space of URIs [URIS].
RDF has a specific serialization syntax based on XML. There are several ways in which a given RDF graph can be prepresented in XML: these various forms allow RDF to be represented in ways that are amenable to specific XML applications. In this way, XML application data can easily be designed to be accessible to generic RDF processors [XML-AS-RDF].
Other syntaxes for RDF graphs are possible (e.g. [NOTATION3]), but only the XML syntax is normatively specified and recommended for use to exchange information between Internet applications.
RDF uses URIs to label resources and properties. Certain URIs are reserved for use by RDF, and may not be used for any purpose not sanctioned the RDF specifications. Specifically, URIs with the following leading substrings are reserved for RDF core vocabulary:
Used with the RDF/XML serialization, these URI prefix strings correspond to XML namespaces [XML-NS] associated with the RDF core vocabulary terms.
NOTE: these namespace URIs are the same as those used in earlier RDF documents [RDF-MS, RDF-SCHEMA]. The URIs have not been updated because the working group feels its work has been to clarify the earlier work rather than to change it.
The vocabulary terms are listed here using QName syntax. The corresponding URI reference is formed by concatenating the URI corresponding to the prefix (see above) with the given local name.
Informal descriptions of some of these terms are given in the RDF vocabularies document [RDF-VOCABULARY]. Where formal semantics are defined, they are given in the RDF formal semantics document [RDF-SEMANTICS].
Some of the above vocabulary terms are used for purely syntactic purposes in the RDF/XML serialization, and do not appear in the abstract graph syntax (see section 3.1). Any other use of these names is considered to be an error.
Other names from the rdf: and rdfs: namespaces (i.e. starting with one of the URI strings noted above) should be used only if they are defined by the RDF specification. Processors encountering unrecognized names in these namespaces should issue a warning, then continue to process them as any other vocabulary.
The following RDG core vocabulary terms defined in previous RDF specification documents have been deprecated for future use:
This section defines the RDF graph syntax. The RDF graph is sometimes referred to as the (data) model of RDF (see the RDF Primer [RDF-PRIMER], and RDF Model & Syntax [RDF-MS]). In brief, the RDF graph is a directed graph with labelled edges and partially labelled nodes.
A goal of this section is the precise definition of equality between RDF graphs. This benefits interoperability (two conformant implementations are more likely to be practically interoperable if they have a precise conception of the way in which they are the same). It is required for the specification of the RDF Test Cases [RDF-TESTS], which depend on testing equality of RDF graphs for their execution. It is required by the RDF Model Theory [RDF-SEMANTICS] which assigns the same meaning to any pair of equal RDF graphs.
Note: Many RDF applications and frameworks do not need to implement RDF graph equality. They do need to respect equality when assigning meaning to RDF graphs. RDF recommendations do not define conformance or compliance levels.
The specification of the RDF graph commences with the labels used in the graph, which can be uri references, string literals, or XML literals; equality is defined for each. It then proceeds to describing arcs (triples), a complete graph and graph equality.
[[[Need to liaise with PatH to ensure graph description is consistent with MT --action jjc]]]
Within an RDF graph, URI reference labels are drawn from the lexical space of the anyURI datatype as defined for XML schema datatypes [XML-SCHEMA2], constrained to be an absolute rather than a relative URI reference, and to be in Unicode Normal Form C [NFC] in conformance with [CHARMOD].
Precisely, a URI Reference Label within an RDF graph is a Unicode string [UNICODE] that:
The disallowed characters that must be %-escaped include all non-ASCII characters, the excluded characters listed in Section 2.4 of [URIS], except for the number sign (#) and percent sign (%) characters and the square bracket characters re-allowed in [RFC-2732].
Disallowed characters must be escaped as follows:
%
HH, where HH is the hexadecimal notation of the byte value).
Two URI reference labels within RDF are equal if and only if they compare as equal, character by character, as Unicode strings. A URI reference label is not equal to a string literal label or an XML literal label.
See the following test cases, per [RDF-TESTS]:
An RDF literal is either an XML literal or a string literal.
Two RDF literals are equal if and only if they are either both XML literals and equal or both string literals and equal.
A string literal label in an RDF graph is composed of a Unicode string [UNICODE] that is in Normal Form C [NFC], and a language identifier that is either null or as specified below.
Two string literals are equal if both components are equal. The Unicode string components are compared on a character by character basis. The language tag components are equal if both are null or if both are defined and equal as language identifiers.
Allowable language identifiers are the legal values for xml:lang as specified by section 2.12, Language Identification, in [XML], or null. Equality of language identifiers (as specified in [RFC-3066]) is defined by case insensitive character by character comparison.
Note: This direct comparison between language identifiers is appropriate for the purpose of defining equality between RDF graphs, but is linguistically naive. [RFC-3066] suggests more advanced comparison techniques.
Note: Literals beginning with a composing character (as defined by [CHARMOD]) are allowed however they may cause interoperability problems, particularly with XML version 1.1 [XML 1.1].
See the following test cases, per [RDF-TESTS]:
[[[Subject to WG disposition of test cases]]]
Within an RDF graph, an XML literal is a Unicode [UNICODE] string paired with a language identifier. The string is well-balanced, self-contained XML element content [XML].
Two XML literals are equal if both components are equal. Comparison of XML literal is described below. The language identifiers are equal if both are null or if both are defined and equal as language identifiers, per [RFC-3066].
Within an RDF graph, an XML literal is a Unicode [UNICODE] string paired with a language identifier.
[[[Edit following to incorporate xml:lang]]]
An XML literal can be used to form an XML document by enclosing it with <tag> and </tag> and encoding the resulting string in UTF-8. No escaping is applied in this process. This resulting document is a well-formed XML document [XML] that also conforms to XML Namespaces [XML-NS].
Note: If compatibility with XML version 1.1 is desired, then XML literals in RDF graphs must be restricted to those that are fully normalized according to [XML 1.1].
Two XML literals are equal if both components are equal. Comparison of XML literals is described below. The language identifiers are equal if both are null or if both are defined and equal as language identifiers, per [RFC-3066].
The definition of equality for XML literals is not precisely defined by this specification. The description given here is used by the RDF Test Cases [RDF-TESTS], and also constrains any implementation defined equality.
Two XML literals may be comparted by the following steps:
Implementations may specialize this definition of equality (i.e. if two XML literals compare equal according to an implementation then they must compare equal according to this definition, but not conversely).
In particular, implementations may treat XML comments as significant, and may treat namespaces that are in scope but not visibly utilized (as defined by [XC14N]) as significant.
[[[should this para be moved to a longer non-normative appendix which would have the goal of showing the DPH that they can do nearly nothing and still conform with this rather opaque requirement. @@@@ The use of character by character equality between XML literals is discouraged, except in the case where XML literals have already been canonicalized with an appropriate treatment of namespaces. @@@ would a test case showing where naivity is insuifficient be helpful, it would contain two character-by-character identical XML literals which had qnames with namespace prefixes bound to different namespaces]]]
See the following test cases, per [RDF-TESTS]:
[[[Subject to WG disposition of test cases]]]
An RDF graph is defined using a set of nodes. Many of the nodes are blank, and some of the nodes are labelled with RDF literals or RDF URI references, i.e. there is a partial labelling function from the set of nodes to the union of the set of RDF literals and RDF URI references.
A tidy set of nodes is one in which no two nodes have equal labels. A tidy set of nodes may have any number of distinct blank nodes.
Two nodes are equal if and only if they are the same node. In particular, two different blank nodes are not equal.
An RDF triple describes an arc in an RDF graph. It contains three components:
The set containing the subject and object nodes of a triple is tidy (per definition in section Nodes).
The subject must not be labelled with an RDF literal.
Two RDF triples are equal if and only if their subjects are equal, their predicates are equal, and their objects are equal.
An RDF graph is a collection of RDF triples.
The nodes of an RDF graph are the set of nodes that are either subject or object of some triple in the graph.
The set of nodes of an RDF graph is tidy (per definition in section Nodes).
Note: The definition of an RDF graph diverges from the definition of a directed graph in a standard text such as [[[missing ref]]] in that: (a) all nodes must be in at least one arc; (b) all the arcs are labelled; (c) some of the nodes are labelled; (d) labels on nodes are required to be distinct; (e) some labels are shared between nodes and arcs.
Two RDF graphs are equal if and only if they are isomorphic. An RDF graph isomorphism is a directed graph isomorphism that respects the labels on both arcs and nodes.
An RDF Graph isomorphism I between two graphs G and G' is a bijection between the nodes of G and the nodes of G', such that:
for all nodes n, s, o in G and all RDF URI references p.
@@@@ I note that I have used a system of typed objects with identity and with named components without introducing or defining it. (Notice that an XML Literal xml"foo"-"en" is distinct from a String Literal "foo"-"en"). Personally I think that introducing and defining such a system will confuse more than enlighten. I could be persuaded ...
@@@Suggestion that this subsection to be deleted, in favour of expressing this as a constraint in the serialization section of the RDF/XML syntax doc.
The following elements of RDF vocabulary have syntactic significance only in the XML serialization, and never appear in the RDF abstract graph syntax:
For the processing of character data that can be represented in different ways, RDF processors are required to conform to Early Uniform Normalization, as described by Character Model for the World Wide Web 1.0 [CHARMOD].
How should RDF treat a URI reference with a fragment identifier? Conventional web architecture has that the meaning of a fragment identifier is dependent on the MIME type of a resource that is obtained by dereferencing the URI part. URIs without fragment identifiers are generally presumed to map to some resource for which a Web representation (or several) can be retrieved. But RDF has no concept of a fragment identifier separate from a URI: RDF treats a URI reference as an opaque identifier that denotes some resource [RDF-SEMANTICS]. Further, an RDF resource identifier may denote something that is not web-retrievable; e.g. a car, or a Unicorn.
These apparently conflicting interpretations can be reconciled if:
This provides a handling of URI referencess and their denotation that is consistent with the RDF model theory and usage, and also with conventional web axioms. This approach somewhat extends the idea of a "fragment" or "view" beyond the common idea (when handling web documents) that it is a physical part of a containing document.
In view of this, it is reasonable to consider that URIs without fragment identifiers are most helpfully used for indicating web-retrievable resources (when used in RDF), and URIs with fragment identifiers are used for abstract ideas that don't have a direct web representation. This is not a hard-and-fast distinction, as the line between resources having or not having a web-retrievable representation is sometimes hard to draw precisely.
The RDF/XML syntax uses QName syntax [XML-NS] to identify various resources, notably RDF properties. But the RDF graph syntax contains only URI references, and does not recognize QName forms.
Mostly, the handling of QNames is a matter for RDF parsers. But there are some occasions where an RDF writer needs to know the correspondence between QNames and URI references (e.g. when using a typed node production). The mapping is described in [RDF-SYNTAX], sections 3.1.2 or 3.1.4.
The editors acknowledge valuable contributions of the following:
This document is a product of extended deliberations by the RDFcore working group, whose members have included:
This specification also draws upon an earlier RDF Model and Syntax document edited by Ora Lassilla and Ralph Swick, and RDF Schema edited by Dan Brickley and R. V. Guha. RDF and RDF Schema Working group members who contributed to this earlier work are:
[[[For reviewers' reference. This appendix will be removed on final publication.]]]
For source information, see paragraph-numbered original documents and issue list:
[[[For reviewers' reference. This appendix will be removed on final publication.]]]
See: http://lists.w3.org/Archives/Public/www-archive/2001Jun/att-0021/00-part and http://www.w3.org/2000/03/rdf-tracking/.
[[[For reviewers' reference. This appendix will be removed on final publication.]]]
$Log: Overview.htm,v $ Revision 1.17 2002/07/25 13:31:00 graham Folded in jjc changes to section 3 Revision 1.16 2002/07/23 11:27:12 graham Remove sections that will be included in the primer: - RDF in HTML - Boolean values Revision 1.15 2002/07/23 11:01:50 graham Removed "RDF specification" section (was section 3) Removed "RDF vocabulary" section (was section 5) Previous section 3.1 listing RDF vocabulary moved to section 2.5. Drafted abstract Drafted introduction section. Revision 1.14 2002/06/29 10:02:08 graham Add rdf:bagID to syntax-reserved vocabulary Remove note of character normalization in 4.2.1 (covered later) Correct reference sub-section numbers Revision 1.12 2002/06/27 16:53:31 graham Minor editorial changes Regenerate table of contents Revision 1.11 2002/06/27 15:55:09 graham Added graph equality description Revision 1.10 2002/06/26 22:05:33 graham Completed initial cut of all issues Only introduction and abstract to do Revision 1.9 2002/06/26 21:41:14 graham Completed initial coverage of intended semantics Completed additional technical issues section Added proposal for fragment identifier handling Reorganized and cross-reference issues list Revision 1.7 2002/06/26 10:24:01 graham Added text for graph syntax, excerpted from: http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002May/att-0089/01-RDF-XML_Syntax_Specification__Revised_.htm Revision 1.6 2002/06/25 17:49:34 graham Filled in section 3 content Included acknowledgements from original RDF documents Revision 1.5 2002/06/24 16:50:03 graham Saved 2002-06-24 working copy Revision 1.4 2002/06/24 16:39:24 graham Completed initial cut of section 2 text: - 2.3.1 Semantics from Primer 7.1 - 2.3.2 social meaning adapted from text by DanBri - 2.3.3-4 from text discussed at face-to-face Some further renaming of sections Revision 1.3 2002/06/24 13:27:16 graham Update current/previous version links Revision 1.2 2002/06/24 13:22:24 graham Transcribe initial issue list to appendix X. Rearrange outline with new sections for graph syntax and informal semamntics for RDF vocabulary. Revision 1.1 2002/06/21 14:57:22 graham Update document name Revision 1.3 2002/06/21 14:45:34 graham Futher rearrangement of outline, to accommodate: - list of RDF vocabulary terms - RDF-in-HTML - RDF namespaces - Addressed issues appendix - Note about pure syntax vocabulary (e.f. rdf:Description) Renamed some section titles Revision 1.2 2002/06/21 10:21:23 graham Rearranged outline to accommodate material from the primer on formal semantics Revision 1.1 2002/06/20 20:47:03 graham Initial version of document