Copyright ©2002 W3C® (MIT, INRIA, Keio), All Rights Reserved. W3C liability, trademark, document use and software licensing rules apply.
The Resource Description Framework (RDF) is a data format for representing metadata about Web resources, and other information. This document defines the abstract graph syntax on which RDF is based, and which serves to link its XML serialization to its formal semantics. It also describes some other technical aspects of RDF that do not fall under the topics of formal semantics, XML serialization syntax or RDF schema and vocabulary definitions (which are each covered by a separate document in this series). These include: discussion of design goals, meaning of RDF documents, key concepts, character normalization and handling of URI references.
This is a W3C RDF Core Working Group Working Draft produced as part of the W3C Semantic Web Activity (Activity Statement).
This document is being released for review by W3C Members and other interested parties to encourage feedback and comments, especially with regard to how the changes affect existing implementations and content.
This is a public W3C Working Draft and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use W3C Working Drafts as reference material or to cite as other than "work in progress". A list of current W3C Recommendations and other technical documents can be found at http://www.w3.org/TR/.
Comments on this document are invited and should be sent to the public mailing list www-rdf-comments@w3.org. An archive of comments is available at http://lists.w3.org/Archives/Public/www-rdf-comments/.
The Resource Description Framework (RDF) is a data format for representing metadata about Web resources, and other information. The normative documentation of RDF falls broadly into the following areas:
This document addresses the last two of these items. The first three are covered by separate documents [RDF-SYNTAX] [RDF-SEMANTICS] [RDF-VOCABULARY].
In section 2, some background to the design goals and rationale of RDF is presented. There is also some discussion of the intended implications of publishing an RDF document (section 2.3).
RDF is based on a graph syntax, which is typically serialized using XML. This graph syntax captures the fundamental structure of RDF, independently of any serialization syntax that may be used. The formal semantics of RDF are defined in terms of the graph syntax. The graph syntax is defined in section 3 of this document
Section 4 presents a number of technical issues that don't clearly fall into any of the more explicit areas noted above.
RDF uses well established ideas from various data and knowledge representation communities, with recognizable relationships to Conceptual Graphs, logic-based knowedge representation, frames, and relational databases [Sowa] [CG] [KIF] [Hayes] [Luger] [Gray].
RDF builds on XML, which provides a syntactic framework for representing documents and other information. It has a simple graph-based data model and formal semantics with a rigorously defined notion of entailment, which in turn provides a basis for well founded deductions in RDF data.
The real value of RDF comes not so much from any single application, but from the possibilities for sharing data between applications. The value of information thus increases as it becomes accessible to more and more applications across the entire Internet.
The development of RDF has been motivated by the following uses, among others:
The design of RDF is intended to meet the following goals:
RDF has a simple data model that is easy for applications to process and manipulate. The data model is independent of any specific serialization syntax.
NOTE: the term "model" used here in "data model" has a completely different sense to its use in the term "model theory". See the RDF model theory specification [RDF-SEMANTICS] or a textbook on logical semantics (e.g., [HUNTER] [DAVIS]) for more information about what logicians call "model theory".
RDF has a formal semantics which provides a sound basis for reasoning about the meaning of an RDF expression. In particular, it supports rigorously defined notions of entailment which provide a basis for defining reliable rules of inference in RDF data.
The vocabulary is fully extensible, being based on URIs with optional fragment identifiers (URIrefs). URIrefs are used for naming all kinds of things in RDF data. The only other kind of label that appears in RDF data is a literal string.
RDF has an XML-based serialization form which, if used appropriately, allows a wide range of "ordinary" XML data to be interpreted as RDF [STRIPEDRDF].
RDF can be used with XML schema datatypes [XML-SCHEMA2], thus assisting the exchange of information between RDF and other XML applications.
[[[Add comment here if goal not fully achieved]]]
To allow operation at Internet scale, RDF is an open-world framework that allows anyone to say anything about anything. In general, it is not assumed that all information about any topic is available. A consequence of this is that RDF cannot prevent anyone from making nonsensical or inconsistent assertions, and applications that build upon RDF must find ways to deal with conflicting sources of information. (This is where RDF departs from the XML approach to data representation, which is generally quite prescriptive and aims to present an application with information that is well-formed and complete for the application's needs.)
Through its use of extensible URI-based vocabularies, RDF aims to provide for universal expression of ground facts; i.e. assertions of specific properties about specific named things.
RDF itself does not provide the machinery of inference, but provides the raw data upon which such machinery can operate. Other work is looking for ways to build more expressive expressions on the basic capabilities of the RDF core language.
RDF is intended to convey assertions that are meaningful to the extent that they may, in appropriate contexts, be used to express the terms of binding agreements.
This goal is explored further in section 2.3 below.
The RDF specification emphasizes the formal structure and meaning of RDF. But there is also a social dimension that is easily overlooked when dealing with such formal aspects.
[[[These words adapted from Primer 7.1]]]
RDF is a language designed to support the Semantic Web, in much the same way that HTML is the language that supports the original Web. The Semantic Web aims for data to be shared and processed by automated tools as well as by people. To serve this purpose, certain meanings of RDF statements must be defined in a very precise manner; this is provided by the RDF Model Theory [RDF-SEMANTICS].
Model-theoretic semantics assumes that a language refers to a 'world', and describes the minimal conditions that such world must satisfy in order to assign an appropriate meaning for every expression in the language. A particular world is called an interpretation, so that model theory might be better called 'interpretation theory'. The idea is to provide an abstract, mathematical account of the properties that any such interpretation must have, making as few assumptions as possible about its actual nature or intrinsic structure. The RDF model theory is couched in the language of set theory simply because that is the normal language of mathematics - for example, the model theory assumes that names denote things in a set IR called the 'universe' - but the use of set- theoretic language is not supposed to imply that the things in the universe are set-theoretic in nature.
The chief utility of such a semantic theory is not to suggest any particular processing model, or to provide any deep analysis of the nature of the things being described by the language (in our case, the nature of resources), but rather to provide a technical tool to analyze the semantic properties of proposed operations on the language; in particular, to provide a way to determine when they preserve meaning.
The RDF model theory treats RDF as a simple assertional language, in which each triple makes a distinct assertion, and the meaning of any triple is not changed by adding other triples. Based on the semantics defined in the model theory, it is simple to translate an RDF graph into a logical expression with essentially the same meaning.
[[[Adapted words from DanBri/PatHayes]]]
RDF/XML documents, i.e. encodings of RDF graphs, can be used to make representations of claims or assertions about the 'real' world. RDF graphs may be asserted to be true, and such an assertion should be understood to carry the same social import and responsibilities as an assertion in any other format. A combination of social (e.g. legal) and technical machinery (protocols, file formats, publication frameworks) provide the contexts that fix the intended meanings of the vocabulary of some piece of RDF, and which distinguish assertions from other uses (e.g. citations, denals or illustrations).
A media type, application/rdf+xml is being registered for indicating the use of RDF/XML as an assertional representation in this way [RDF-MIME-TYPE].
[[[Adapted words from PatHayes/Jos de Roo]]]
Using RDF, 'received meaning' can be characterized as the social meaning of any logical consequences.
If you publish a graph G and G logically entails G', and we interpret G' using the same social conventions that everyone agrees could be reasonably used to interpret G, then you are asserting that content of G' as well.
Human publishers of RDF content commit themselves to the mechanically-inferred social obligations. The machines doing the inferences aren't expected to know about all these social conventions and obligations.
The social conventions used to interpret a graph may include assumed truths, for which no logical derivation is available, and socially accepted consequences whose rules of deduction are embedded in arbitrary decision-making processes.
Semantic web vocabulary gains currency through use, so also do semantic web deductionshave force through social acceptance. Semantic web deduction operates in a combination of logical and social (non-logical) dimensions.
To support logical entailments, formal RDF meaning is based on a model theory (see section 2.3.1). The notion of truth is crucial: a possible world may correspond to some RDF if and only if the RDF statement is true in that world.
The RDF core language provides a way to make simple formal assertions, with no machinery for formalizing allowable inferences. Inferences are performed by processes, embedded in software implentations, whose validity is not formally demonstrable, and must be assumed or trusted to be socially acceptable. It is expected that semantic web languages layered on RDF will give formal expression to allowable inferences, thus to allow provable deductions by generic software modules to replace individual ad-hoc implemenations.
When an RDF graph is asserted in the web, its publisher is saying something about their view of the world. (The mechanism for deciding whether or not a graph is asserted is not defined here, but it is presumed that the publisher's intent will be clear in some way.)
When a user invokes an application, there is also a social and technical context of invocation that determines some set of RDF assertions that will be assumed to be true: the application itself, and any RDF files that are passed to it. Garbage-in, garbage-out applies: if the initial assumed facts are wrong or meaningless, the results will have little value. No specfic mechanisms for deciding or evaluating the validity of any such assertions are defined here.
Noting that there is no single human opinion about the truth of some statements, the graph may further contain commentary for human interpreters to indicate the realm of human interpretation that should be applied. This means a graph may contain "defining information" that is opaque to logical reasoners. This information may be used by human interpreters of RDF informaton, or programmers writing software to perform specialized forms of deduction in the Semantic Web.
RDF uses the following key concepts:
The underlying structure of any RDF expression is a directed labelled graph (or multigraph), which consists of nodes and labelled directed arcs that link pairs of nodes (these notions are defined more formally in section 3). The formal semantics for RDF is defined in terms of this graph syntax. An RDF expression is sometimes called an RDF graph. The graph can conveniently be represented as a set of triples, where each triple contains two node labels and an arc label:
Each arc corresponds to a startement that asserts a relationship between the nodes that it links. The meaning of an RDF graph is the conjunction (i.e. logical AND) of all the statements that it contains.
Nodes in an RDF graph are labelled with URIs with optional fragment identifiers (URI references, or URIrefs), literal strings, or nothing at all. Arcs are labelled with URIrefs. (See [URIS], section 4, for a description of URI reference forms, noting that relative URIs are not used in an RDF graph. See also section 3.1.)
The label on a node indicates what that node is meant to represent. The label on an arc names the relationship that is asserted to hold between the nodes connected by that arc. Some URIrefs may indicate web resources, and a node thus labelled is presumed to denote that resource. Other URIrefs may represent abstract ideas or values rather than a retreivable Web resource. RDF thus leverages the universal naming space of URIs [URIS].
RDF has a specific serialization syntax based on XML. There are several ways in which a given RDF graph can be represented in XML: these various forms allow RDF to be represented in ways that are amenable to specific XML applications. In this way, XML application data can easily be designed to be accessible to generic RDF processors [XML-AS-RDF].
Only the XML syntax is normatively specified and recommended for use to exchange information between Internet applications: other syntaxes for RDF graphs are possible, and may be widely used (e.g. [NOTATION3]), but are not covered by this recommendation.
RDF uses URIs to label resources and properties. Certain URIs are reserved for use by RDF, and may not be used for any purpose not sanctioned the RDF specifications. Specifically, URIs with the following leading substrings are reserved for RDF core vocabulary:
Used with the RDF/XML serialization, these URI prefix strings correspond to XML namespaces [XML-NS] associated with the RDF core vocabulary terms.
NOTE: these namespace URIs are the same as those used in earlier RDF documents [RDF-MS] [RDF-SCHEMA]. The URIs have not been updated because the working group feels its work has been to clarify the earlier work rather than to change it.
Vocabulary terms in the rdf: namespace are listed in section 3.4 [[[check this]]] of the RDF syntax specification [RDF-SYNTAX].
Vocabulary terms defined in the rdfs: namespace are defined [[[where?]]] in the RDF schema vocabulary specification [RDF-VOCABULARY].
This section defines the RDF graph syntax. The RDF graph is sometimes referred to as the (data) model of RDF (see the RDF Primer [RDF-PRIMER], and RDF Model & Syntax [RDF-MS]). In brief, the RDF graph is a directed graph with labelled edges and partially labelled nodes.
A goal of this section is the precise definition of equality between RDF graphs. This benefits interoperability (two conformant implementations are more likely to be practically interoperable if they have a precise conception of the way in which they are the same). It is required for the specification of the RDF Test Cases [RDF-TESTS], which depend on testing equality of RDF graphs for their execution. It is required by the RDF Model Theory [RDF-SEMANTICS] which assigns the same meaning to any pair of equal RDF graphs.
Note: Many RDF applications and frameworks do not need to implement RDF graph equality. They do need to respect equality when assigning meaning to RDF graphs.
The specification of the RDF graph commences with the labels used in the graph, which can be URI references, string literals, or XML literals; equality is defined for each. It then proceeds to describing arcs (triples), a complete graph and graph equality.
[[[This section, particularly how nodes and node labels are handled, is not completely in sync with the current Model Theory WD.]]]
A URI Reference Label within an RDF graph (an RDF URI reference) is a Unicode string [UNICODE] that:
The disallowed characters that must be %-escaped include all non-ASCII characters, the excluded characters listed in Section 2.4 of [URIS], except for the number sign (#) and percent sign (%) characters and the square bracket characters re-allowed in [RFC-2732].
Disallowed characters must be escaped as follows:
%
HH, where HH is the hexadecimal notation of the byte value).
Two RDF URI references are equal if and only if they compare as equal, character by character, as Unicode strings. A URI reference label is not equal to a string literal label or an XML literal label.
Note: RDF URI references are compatible with the anyURI datatype as defined by XML schema datatypes [XML-SCHEMA2], constrained to be an absolute rather than a relative URI reference, and constrained to be in Unicode Normal Form C [NFC] (for compatibility with [CHARMOD]).
See the following test cases, per [RDF-TESTS]:
An RDF literal is either an XML literal or a string literal.
Two RDF literals are equal if and only if they are either both XML literals and equal or both string literals and equal.
A string literal label in an RDF graph is composed of a Unicode string [UNICODE] that is in Normal Form C [NFC], and a language identifier that is either null or as specified below.
Two string literals are equal if both components are equal. The Unicode string components are compared on a character by character basis. The language tag components are equal if both are null or if both are defined and equal as language identifiers.
Allowable language identifiers are the legal values for xml:lang as specified by section 2.12, Language Identification, in [XML], or null. Equality of language identifiers (as specified in [RFC-3066]) is defined by case insensitive character by character comparison.
Note: This direct comparison between language identifiers is appropriate for the purpose of defining equality between RDF graphs, but is linguistically naive. [RFC-3066] suggests more advanced comparison techniques.
Note: Literals beginning with a composing character (as defined by [CHARMOD]) are allowed however they may cause interoperability problems, particularly with XML version 1.1 [XML 1.1].
See the following test cases, per [RDF-TESTS]:
[[[Subject to WG disposition of test cases]]]
Within an RDF graph, an XML literal is a Unicode [UNICODE] string paired with a language identifier. The string is well-balanced, self-contained XML element content [XML].
An XML literal, with non-null language identifier, can be used to form an XML document by concatenating the five strings:
The resulting Unicode string is then encoded in UTF-8.
When the language identifier is null, the corresponding XML document is formed by enclosing the Unicode string of the XML literal with "<tag>" and "</tag>" and encoding the resulting string in UTF-8.
No escaping is applied in either process. The choice of tag is arbitrary.
This resulting XML document corresponding to the XML literal is a well-formed XML document [XML] that also conforms to XML Namespaces [XML-NS].
Note: If compatibility with XML version 1.1 is desired, then XML literals in RDF graphs must be restricted to those that are fully normalized according to [XML 1.1].
The exclusive canonicalization of an XML literal is formed by:
If two XML literals are equal then:
This specification, above, gives necessary conditions for the equality of XML literals. The RDF Test Cases [RDF-TESTS] treat these necessary conditions as also sufficient.
Implementations are free to add additional sufficient conditions for equality. If two XML literals compare equal according to an implementation then they must compare equal according to this definition, but not conversely. In particular, XML comments may be treated as significant, and namespaces that are in scope but not visibly utilized (as defined by [XC14N]) may be treated as significant.
[[[ Is there a need for a longer non-normative appendix on implemenation issues for XML literals? This could discuss (a) minimal implementations, for which equality is not needed, and where the set of namespaces and namespace prefixes can be fixed in advance (b) the correct and incorrect use of character by character equality for XML literals. Should there be test cases for issue rdfms-xml-literal-namespaces? ]]]
See the following test cases, per [RDF-TESTS]:
[[[Subject to WG disposition of test cases]]]
An RDF graph is defined using a set of nodes. Many of the nodes are blank, and some of the nodes are labelled with RDF literals or RDF URI references, i.e. there is a partial labelling function from the set of nodes to the union of the set of RDF literals and RDF URI references.
A tidy set of nodes is one in which no two nodes have equal labels. A tidy set of nodes may have any number of distinct blank nodes.
Two nodes are equal if and only if they are the same node. In particular, two different blank nodes are not equal.
An RDF triple describes an arc in an RDF graph. It contains three components:
The set containing the subject and object nodes of a triple is tidy (per definition in section Nodes).
The subject must not be labelled with an RDF literal.
Two RDF triples are equal if and only if their subjects are equal, their predicates are equal, and their objects are equal.
An RDF graph is a set of RDF triples.
The set of nodes of an RDF graph is the set of nodes that are either subject or object of some triple in the graph.
The set of nodes of an RDF graph is tidy (per definition in section Nodes).
[[[Suggestions of a standard graph theory text which treats digraphs as primary would be welcome.]]]
Note: The definition of an RDF graph diverges from the definition of a directed graph in a standard text such as [[[missing ref]]] in that: (a) all nodes must be in at least one arc; (b) all the arcs are labelled; (c) some of the nodes are labelled; (d) labels on nodes are required to be distinct; (e) some labels are shared between nodes and arcs.
Two RDF graphs are equal if and only if they are isomorphic. An RDF graph isomorphism is a directed graph isomorphism that respects the labels on both arcs and nodes.
An RDF Graph isomorphism I between two graphs G and G' is a bijection between the nodes of G and the nodes of G', such that:
for all nodes n, s, o in G and all RDF URI references p.
[[[This subsection normatively depends on CHARMOD, currently a last call working draft. If CHARMOD has not reached the appropriate recommendation status as this document progresses down the recommendation track, this section will be deleted.]]]
For the processing of character data that can be represented in different ways, RDF processors are required to conform to Early Uniform Normalization, as described by Character Model for the World Wide Web 1.0 [CHARMOD].
How should RDF treat a URI reference with a fragment identifier? Conventional web architecture has that the meaning of a fragment identifier is dependent on the MIME type of a resource that is obtained by dereferencing the URI part. URIs without fragment identifiers are generally presumed to map to some resource for which a Web representation (or several) can be retrieved. But RDF has no concept of a fragment identifier separate from a URI: RDF treats a URI reference as an opaque identifier that denotes some resource [RDF-SEMANTICS]. Further, an RDF resource identifier may denote something that is not web-retrievable; e.g. a car, or a Unicorn.
These apparently conflicting interpretations can be reconciled if:
This provides a handling of URI referencess and their denotation that is consistent with the RDF model theory and usage, and also with conventional web axioms. This approach somewhat extends the idea of a "fragment" or "view" beyond the common idea (when handling web documents) that it is a physical part of a containing document.
In view of this, it is reasonable to consider that URIs without fragment identifiers are most helpfully used for indicating web-retrievable resources (when used in RDF), and URIs with fragment identifiers are used for abstract ideas that don't have a direct web representation. This is not a hard-and-fast distinction, as the line between resources having or not having a web-retrievable representation is sometimes hard to draw precisely.
The RDF/XML syntax uses QName syntax [XML-NS], section 3, to identify various resources, notably RDF properties. But the RDF graph syntax contains only URI references, and does not recognize QName forms.
Mostly, QNames are handled by the mapping between RDF/XML documents and RDF graph syntax. But there are some occasions where an RDF writer needs to know the correspondence between QNames and URI references (e.g. when using a typed node production). The mapping is described in [RDF-SYNTAX], sections 3.1.2 or 3.1.4.
The editors acknowledge valuable contributions of the following:
This document is a product of extended deliberations by the RDFcore working group, whose members have included:
This specification also draws upon an earlier RDF Model and Syntax document edited by Ora Lassilla and Ralph Swick, and RDF Schema edited by Dan Brickley and R. V. Guha. RDF and RDF Schema Working group members who contributed to this earlier work are:
[[[For reviewers' reference. This appendix will be removed on final publication.]]]
$Log: Overview.htm,v $ Revision 1.27 2002/07/29 13:21:36 graham Editors' review - minor changes Revision 1.26 2002/07/29 11:56:26 graham Removed redundant paragraphs from 2.3.4. Revision 1.25 2002/07/29 11:52:13 graham Reworked and contracted section 2.3.3. Revision 1.24 2002/07/29 11:10:05 graham Regenerate table of contents Revision 1.23 2002/07/29 11:01:39 graham Various updates to the abstract graph definition (section 3) Udate XC14N reference to recently published W3C REC Revision 1.22 2002/07/29 09:49:44 graham Changed document title.
Revision 1.21 2002/07/29 08:17:17 graham Complete reference details. Add note about normative dependency on [CHARMOD] (sect 4.1) Revision 1.20 2002/07/26 17:06:56 graham Filled in details for most references Revision 1.19 2002/07/25 17:05:39 graham In section 4.3, reference QName definition section of [XML-NS], and replace mention RDF parsers and writers. Revision 1.18 2002/07/25 17:00:17 graham Various changes in response to review comments: - change title - change section 2 title - add some additional detail to the abstract - fix typo in abstract - change style of citation lists - change link in section 2.3.2 - add cross-reference to section 2.4.1 - expand introduction of 'URIref' in 2.4.2, and add cross-ref - add reference for [XML-AS-RDF] Revision 1.17 2002/07/25 13:31:00 graham Folded in jjc changes to section 3 Revision 1.16 2002/07/23 11:27:12 graham Remove sections that will be included in the primer: - RDF in HTML - Boolean values Revision 1.15 2002/07/23 11:01:50 graham Removed "RDF specification" section (was section 3) Removed "RDF vocabulary" section (was section 5) Previous section 3.1 listing RDF vocabulary moved to section 2.5. Drafted abstract Drafted introduction section. Revision 1.14 2002/06/29 10:02:08 graham Add rdf:bagID to syntax-reserved vocabulary Remove note of character normalization in 4.2.1 (covered later) Correct reference sub-section numbers Revision 1.12 2002/06/27 16:53:31 graham Minor editorial changes Regenerate table of contents Revision 1.11 2002/06/27 15:55:09 graham Added graph equality description Revision 1.10 2002/06/26 22:05:33 graham Completed initial cut of all issues Only introduction and abstract to do Revision 1.9 2002/06/26 21:41:14 graham Completed initial coverage of intended semantics Completed additional technical issues section Added proposal for fragment identifier handling Reorganized and cross-reference issues list Revision 1.8 2002/06/26 14:42:08 graham Reorganize vocabulary listing in section 3 Started work on vocabulary intended semantics Added working group members to acknowledgements Reposition vocabulary listing in section 3 Make note of deprecated vocabulary terms Added text for embedding RDF in HTML
Revision 1.7 2002/06/26 10:24:01 graham Added text for graph syntax, excerpted from: http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002May/att-0089/01-RDF-XML_Syntax_Specification__Revised_.htm Revision 1.6 2002/06/25 17:49:34 graham Filled in section 3 content Included acknowledgements from original RDF documents Revision 1.5 2002/06/24 16:50:03 graham Saved 2002-06-24 working copy Revision 1.4 2002/06/24 16:39:24 graham Completed initial cut of section 2 text: - 2.3.1 Semantics from Primer 7.1 - 2.3.2 social meaning adapted from text by DanBri - 2.3.3-4 from text discussed at face-to-face Some further renaming of sections Revision 1.3 2002/06/24 13:27:16 graham Update current/previous version links Revision 1.2 2002/06/24 13:22:24 graham Transcribe initial issue list to appendix X. Rearrange outline with new sections for graph syntax and informal semamntics for RDF vocabulary. Revision 1.1 2002/06/21 14:57:22 graham Update document name Revision 1.3 2002/06/21 14:45:34 graham Futher rearrangement of outline, to accommodate: - list of RDF vocabulary terms - RDF-in-HTML - RDF namespaces - Addressed issues appendix - Note about pure syntax vocabulary (e.f. rdf:Description) Renamed some section titles Revision 1.2 2002/06/21 10:21:23 graham Rearranged outline to accommodate material from the primer on formal semantics Revision 1.1 2002/06/20 20:47:03 graham Initial version of document