The World Wide Web was originally built for human consumption, and although everything on it is machine-readable, this data is not machine-understandable. It is very hard to automate anything on the Web, and because of the volume of information the Web contains, it is not possible to manage it manually. The solution proposed here is to use metadata to describe the data contained on the Web. Metadata is "data about data" (for example, a library catalog is metadata, since it describes publications) or specifically in the context of this specification "data describing Web resources". The distinction between "data" and "metadata" is not an absolute one; it is a distinction created primarily by a particular application, and many times the same resource will be interpreted in both ways simultaneously.
Resource Description Framework (RDF) is a foundation for processing metadata; it provides interoperability between applications that exchange machine-understandable information on the Web. RDF emphasizes facilities to enable automated processing of Web resources. RDF can be used in a variety of application areas; for example: in resource discovery to provide better search engine capabilities, in cataloging for describing the content and content relationships available at a particular Web site, page, or digital library, by intelligent software agents to facilitate knowledge sharing and exchange, in content rating, in describing collections of pages that represent a single logical "document", for describing intellectual property rights of Web pages, and for expressing the privacy preferences of a user as well as the privacy policies of a Web site. RDF with digital signatures will be key to building the "Web of Trust" for electronic commerce, collaboration, and other applications.
This document introduces a model for representing RDF metadata as well as a syntax for encoding and transporting this metadata in a manner that maximizes the interoperability of independently developed Web servers and clients. The syntax presented here uses the Extensible Markup Language [XML]: one of the goals of RDF is to make it possible to specify semantics for data based on XML in a standardized, interoperable manner. RDF and XML are complementary: RDF is a model of metadata and only addresses by reference many of the encoding issues that transportation and file storage require (such as internationalization, character sets, etc.). For these issues, RDF relies on the support of XML. It is also important to understand that this XML syntax is only one possible syntax for RDF and that alternate ways to represent the same RDF data model may emerge.
The broad goal of RDF is to define a mechanism for describing resources that makes no assumptions about a particular application domain, nor defines (a priori) the semantics of any application domain. The definition of the mechanism should be domain neutral, yet the mechanism should be suitable for describing information about any domain.
This specification will be followed by other documents that will complete the framework. Most importantly, to facilitate the definition of metadata, RDF will have a class system much like many object-oriented programming and modeling systems. A collection of classes (typically authored for a specific purpose or domain) is called a schema. Classes are organized in a hierarchy, and offer extensibility through subclass refinement. This way, in order to create a schema slightly different from an existing one it is not necessary to "reinvent the wheel" but one can just provide incremental modifications to the base schema. Through the sharability of schemas RDF will support the reusability of metadata definitions. Due to RDF's incremental extensibility, agents processing metadata will be able to trace the origins of schemata they are unfamiliar with back to known schemata and perform meaningful actions on metadata they weren't originally designed to process. The sharability and extensibility of RDF also allows metadata authors to use multiple inheritance to "mix" definitions, to provide multiple views to their data, leveraging work done by others. In addition, it is possible to create RDF instance data based on multiple schemata from multiple sources (i.e., "interleaving" different types of metadata). Schemas may themselves be written in RDF; a companion document to this specification, [RDFSchema], describes one set of properties and classes for describing RDF schemas.
As a result of many communities coming together and agreeing on basic principles of metadata representation and transport, RDF has drawn influence from several different sources. The main influences have come from the Web standardization community itself in the form of HTML metadata and PICS, the library community, the structured document community in the form of SGML and more importantly XML, and also the knowledge representation (KR) community. There are also other areas of technology that contributed to the RDF design; these include object-oriented programming and modeling languages, as well as databases. While RDF draws from the KR community, readers familiar with that field are cautioned that RDF does not specify a mechanism for reasoning. RDF can be characterized as a simple frame system. A reasoning mechanism could be built on top of this frame system.
The foundation of RDF is a model for representing named properties and property values. The RDF model draws on well-established principles from various data representation communities. RDF properties may be thought of as attributes of resources and in this sense correspond to traditional attribute-value pairs. RDF properties also represent relationships between resources and an RDF model can therefore resemble an entity-relationship diagram. (More precisely, RDF Schemas — which are themselves instances of RDF data models — are ER diagrams.) In object-oriented design terminology, resources correspond to objects and properties correspond to instance variables.
The RDF data model is a syntax-neutral way of representing RDF expressions. The data model representation is used to evaluate equivalence in meaning. Two RDF expressions are equivalent if and only if their data model representations are the same. This definition of equivalence permits some syntactic variation in expression without altering the meaning. (See Section 6. for additional discussion of string comparison issues.)
The basic data model consists of three object types:
Resources | All things being described by RDF expressions are called resources. A resource may be an entire Web page; such as the HTML document "http://www.w3.org/Overview.html" for example. A resource may be a part of a Web page; e.g. a specific HTML or XML element within the document source. A resource may also be a whole collection of pages; e.g. an entire Web site. A resource may also be an object that is not directly accessible via the Web; e.g. a printed book. Resources are always named by URIs plus optional anchor ids (see [URI]). Anything can have a URI; the extensibility of URIs allows the introduction of identifiers for any entity imaginable. |
Properties | A property is a specific aspect, characteristic, attribute, or relation used to describe a resource. Each property has a specific meaning, defines its permitted values, the types of resources it can describe, and its relationship with other properties. This document does not address how the characteristics of properties are expressed; for such information, refer to the RDF Schema specification). |
Statements | A specific resource together with a named property plus the value of that property for that resource is an RDF statement. These three individual parts of a statement are called, respectively, the subject, the predicate, and the object. The object of a statement (i.e., the property value) can be another resource or it can be a literal; i.e., a resource (specified by a URI) or a simple string or other primitive datatype defined by XML. In RDF terms, a literal may have content that is XML markup but is not further evaluated by the RDF processor. There are some syntactic restrictions on how markup in literals may be expressed; see Section 2.2.1. |
Resources are identified by a resource identifier. A resource identifier is a URI plus an optional anchor id (see Section 2.2.1.). For the purposes of this section, properties will be referred to by a simple name.
Consider as a simple example the sentence:
Ora Lassila is the creator of the resource http://www.w3.org/Home/Lassila.
This sentence has the following parts:
Subject (Resource) http://www.w3.org/Home/Lassila Predicate (Property) Creator Object (literal) "Ora Lassila"
In this document we will diagram an RDF statement pictorially using directed labeled graphs (also called "nodes and arcs diagrams"). In these diagrams, the nodes (drawn as ovals) represent resources and arcs represent named properties. Nodes that represent string literals will be drawn as rectangles. The sentence above would thus be diagrammed as:
Figure 1: Simple node and arc diagram
Note: The direction of the arrow is important. The arc always starts at the subject and points to the object of the statement. The simple diagram above may also be read "http://www.w3.org/Home/Lassila has creator Ora Lassila", or in general "<subject> HAS <predicate> <object>".
Now, consider the case that we want to say something more about the characteristics of the creator of this resource. In prose, such a sentence would be:
The individual whose name is Ora Lassila, email <lassila@w3.org>, is the creator of http://www.w3.org/Home/Lassila.
The intention of this sentence is to make the value of the Creator property a structured entity. In RDF such an entity is represented as another resource. The sentence above does not give a name to that resource; it is anonymous, so in the diagram below we represent it with an empty oval:
Figure 2: Property with structured value
Note: corresponding to the reading in the previous note, this diagram could be read "http://www.w3.org/Home/Lassila has creator something and something has name Ora Lassila and email lassila@w3.org".
The structured entity of the previous example can also be assigned a unique identifier. The choice of identifier is made by the application database designer. To continue the example, imagine that an employee id is used as the unique identifier for a "person" resource. The URIs that serve as the unique keys for each employee (as defined by the organization) might then be something like http://www.w3.org/staffId/85740. Now we can write the two sentences:
The individual referred to by employee id 85740 is named Ora Lassila and has the email address lassila@w3.org. The resource http://www.w3.org/Home/Lassila was created by this individual.
The RDF model for these sentences is:
Figure 3: Structured value with identifier
Note that this diagram is identical to the previous one with the addition of the URI for the previously anonymous resource. From the point of view of a second application querying this model, there is no distinction between the statements made in a single sentence and the statements made in separate sentences. Some applications will need to be able to make such a distinction however, and RDF supports this; see Section 4, Statements about Statements, for further details.
[...]
[...]
[...]
When we write a sentence in natural language we use words that are meant to convey a certain meaning. That meaning is crucial to understanding the statements and, in the case of applications of RDF, is crucial to establishing that the correct processing occurs as intended. It is crucial that both the writer and the reader of a statement understand the same meaning for the terms used, such as Creator, approvedBy, Copyright, etc. or confusion will result. In a medium of global scale such as the World Wide Web it is not sufficient to rely on shared cultural understanding of concepts such as "creatorship"; it pays to be as precise as possible.
Meaning in RDF is expressed through reference to a schema. You can think of a schema as a kind of dictionary. A schema defines the terms that will be used in RDF statements and gives specific meanings to them. A variety of schema forms can be used with RDF, including a specific form defined in a separate document [RDFSchema] that has some specific characteristics to help with automating tasks using RDF.
A schema is the place where definitions and restrictions of usage for properties are documented. In order to avoid confusion between independent -- and possibly conflicting -- definitions of the same term, RDF uses the XML namespace facility. Namespaces are simply a way to tie a specific use of a word in context to the dictionary (schema) where the intended definition is to be found. In RDF, each predicate used in a statement must be identified with exactly one namespace, or schema. However, a Description element may contain statements with predicates from many schemas. Examples of RDF Descriptions that use more than one schema appear in Section 7.
Often the value of a property is something that has additional contextual information that is considered "part of" that value. In other words, there is a need to qualify property values. Examples of such qualification include naming a unit of measure, a particular restricted vocabulary, or some other annotation. For some uses it is appropriate to use the property value without the qualifiers. For example, in the statement "the price of that pencil is 75 U.S. cents" it is often sufficient to say simply "the price of that pencil is 75".
In the RDF model a qualified property value is simply another instance of a structured value. The object of the original statement is this structured value and the qualifiers are further properties of this common resource. The principal value being qualified is given as the value of the value property of this common resource. See Section 7.3. Non-Binary Relations for an example of the use of the value property.
Frequently it is necessary to refer to a collection of resources; for example, to say that a work was created by more than one person, or to list the students in a course, or the software modules in a package. RDF containers are used to hold such lists of resources or literals.
RDF defines three types of container objects:
Note: The definitions of Bag and Sequence explicitly permit duplicate values. RDF does not define a core concept of Set, which would be a Bag with no duplicates, because the RDF core does not mandate an enforcement mechanism in the event of violations of such constraints. Future work layered on the RDF core may define such facilities.
To represent a collection of resources, RDF uses an additional resource that identifies the specific collection (an instance of a collection, in object modeling terminology). This resource must be declared to be an instance of one of the container object types defined above. The type property, defined below, is used to make this declaration. The membership relation between this container resource and the resources that belong in the collection is defined by a set of properties defined expressly for this purpose. These membership properties are named simply "_1", "_2", "_3", etc. Container resources may have other properties in addition to the membership properties and the type property. Any such additional statements describe the container; see Section 3.3, Distributive Referents, for discussion of statements about each of the members themselves.
A common use of containers is as the value of a property. When used in this way, the statement still has a single statement object regardless of the number of members in the container; the container resource itself is the object of the statement.
For example, to represent the sentence
The students in course 6.001 are Amy, Tim, John, Mary, and Sue.
the RDF model is
Figure 4: Simple Bag container
Bag containers are not equivalent to repeated properties of the same type; see Section 3.5. for a discussion of the difference. Authors will need to decide on a case-by-case basis which one (repeated property statement or Bag) is more appropriate to use.
The sentence
The source code for X11 may be found at ftp.x.org, ftp.cs.purdue.edu, or ftp.eu.net.
is modeled in RDF as
Figure 5: Simple Alternative container
Alternative containers are frequently used in conjunction with language tagging. A work whose title has been translated into several languages might have its Title property pointing to an Alternative container holding each of the language variants.
[...]
Container structures give rise to an issue about statements: when a statement is made referring to a collection, what "thing" is the statement describing? Or in other words, to what object is the statement is referring? Is the statement describing the container itself or is the statement describing the members of the container? The object being described (in the XML syntax indicated by the about attribute) is in RDF called the referent.
[...]
The referent of the Description is the container (the Bag), not its members. One would sometimes like to write a statement about each of the contained objects individually, instead of the container itself. In order to express that "Ora Lassila" is the creator of each of the pages, a different kind of referent is called for, one that distributes over the members of the container. This referent in RDF is expressed using the aboutEach attribute:
[3a] idAboutAttr ::= idAttr | aboutAttr | aboutEachAttr [26] aboutEachAttr ::= 'aboutEach="' URI-reference '"'
[...]
We will call the new referent type a distributive referent. Distributive referents allow us to "share structure" in an RDF Description. For example, when writing several Descriptions that all have a number of common statement parts (predicates and objects), the common parts can be shared among all the Descriptions, possibly resulting in space savings and more maintainable metadata. The value of an aboutEach attribute must be a container. Using a distributive referent on a container is the same as making all the statements about each of the members separately.
No explicit graph representation of distributive referents is defined. Instead, in terms of the statements made, distributive referents are expanded into the individual statements about the individual container members (internally, implementations are free to retain information about the distributive referents - in order to save space, for example - as long as any querying functions work as if all of the statements were made individually).
One very frequent use of metadata is to make statements about "all pages at my Web site", or "all pages in this branch of my Web site". In many cases it is impractical or even undesirable to try to list each such resource explicitly and identify it as a member of a container. RDF therefore has a second distributive referent type. This second distributive referent type is a shorthand syntax that represents an instance of a Bag whose members are by definition all resources whose resource identifiers begin with a specified string:
[26a] aboutEachAttr ::= 'aboutEach="' URI-reference '"' | 'aboutEachPrefix="' string '"'
The aboutEachPrefix attribute declares that there is a Bag whose members are all the resources whose fully resolved resource identifiers begin with the character string given as the value of the attribute. The statements in a Description that has the aboutEachPrefix attribute apply individually to each of the members of this Bag.
A resource may have multiple statements with the same predicate (i.e., using the same property). This is not the same as having a single statement whose object is a container containing multiple members. The choice of which to use in any particular circumstance is in part made by the person who designs the schema and in part made by the person who writes the specific RDF statements.
Consider as an example the relationship between a writer and her publications. We might have the sentence
Sue has written "Anthology of Time", "Zoological Reasoning", "Gravitational Reflections".
That is, there are three resources each of which was written independently by the same writer.
Figure 6: Repeated property
In this example there is no stated relationship between the publications other than that they were written by the same person.
On the other hand, the sentence
The committee of Fred, Wilma, and Dino approved the resolution.
says that the three committee members as a whole voted in a certain manner; it does not necessarily state that each committee member voted in favor of the article. It would be incorrect to model this sentence as three separate approvedBy statements, one for each committee member, as this would state the vote of each individual member. Rather, it is better to model this as a single approvedBy statement whose object is a Bag containing the committee members' identities:
Figure 7: Using Bag to indicate a collective opinion
The choice of which representation to use, Bag or repeated property, is made by the person creating the metadata after considering the schema. If, for example, in the publications example above we wished to say that those were the complete set of publications then the schema might include a property called publications for that purpose. The value of the publications property would be a Bag listing all of Sue's works.
In addition to making statements about Web resources, RDF can be used for making statements about other RDF statements; we will refer to these as higher-order statements. In order to make a statement about another statement, we actually have to build a model of the original statement; this model is a new resource to which we can attach additional properties.
Statements are made about resources. A model of a statement is the resource we need in order to be able to make new statements (higher-order statements) about the modeled statement.
For example, let us consider the sentence
Ora Lassila is the creator of the resource http://www.w3.org/Home/Lassila.
RDF would regard this sentence as a fact. If, instead, we write the sentence
Ralph Swick says that Ora Lassila is the creator of the resource http://www.w3.org/Home/Lassila.
we have said nothing about the resource http://www.w3.org/Home/Lassila; instead, we have expressed a fact about a statement Ralph has made. In order to express this fact to RDF, we have to model the original statement as a resource with four properties. This process is formally called reification in the Knowledge Representation community. A model of a statement is called a reified statement.
To model statements RDF defines the following properties:
A new resource with the above four properties represents the original statement and can both be used as the object of other statements and have additional statements made about it. The resource with these four properties is not a replacement for the original statement, it is a model of the statement. A statement and its corresponding reified statement exist independently in an RDF graph and either may be present without the other. The RDF graph is said to contain the fact given in the statement if and only if the statement is present in the graph, irrespective of whether the corresponding reified statement is present.
To model the example above, we could attach another property to the reified statement (say, "attributedTo") with an appropriate value (in this case, "Ralph Swick").
[...]
Figure 8 represents this in graph form. Syntactically this is rather verbose; in Section 4.2. we present a shorthand for making statements about statements.
Figure 8: Representation of a reified statement
Reification is also needed to represent explicitly in the model the statement grouping implied by Description elements. The RDF graph model does not need a special construct for Descriptions; since Descriptions really are collections of statements, a Bag container is used to indicate that a set of statements came from the same (syntactic) Description. Each statement within a Description is reified and each of the reified statements is a member of the Bag representing that Description. As an example [... ] the graph shown in Figure 9.
Figure 9: Using Bag to represent statement grouping
Note the new attribute bagID.
[...]
BagID and ID should not be confused. ID specifies the identification of an in-line resource whose properties are further detailed in the Description. BagID specifies the identification of the container resource whose members are the reified statements about another resource. A Description may have both an ID attribute and a bagID attribute.
[...]
This specification shows three representations of the data model; as 3-tuples (triples), as a graph, and in XML. These representations have equivalent meaning. The mapping between the representations used in this specification is not intended to constrain in any way the internal representation used by implementations.
The RDF data model is defined formally as follows:
We can view a set of statements (members of Statements) as a directed labeled graph: each resource and literal is a vertex; a triple {p, s, o} is an arc from s to o, labeled by p. This is illustrated in figure 11.
Figure 11: Simple statement graph template
This can be read either
o is the value of p for s
or (left to right)
s has a property p with a value o
or even
the p of s is o
For example, the sentence
Ora Lassila is the creator of the resource http://www.w3.org/Home/Lassila
would be represented graphically as follows:
Figure 12: Simple statement graph
and the corresponding triple (member of Statements) would be
{creator, [http://www.w3.org/Home/Lassila], "Ora Lassila"}
The notation [I] denotes the resource identified by the URI I and quotation marks denote a literal.
Using the triples, we can explain how statements are reified (as introduced in Section 4). Given a statement
{creator, [http://www.w3.org/Home/Lassila], "Ora Lassila"}
we can express the reification of this as a new resource X as follows:
{type, [X], [RDF:Statement]}
{predicate, [X], [creator]}
{subject, [X], [http://www.w3.org/Home/Lassila]}
{object, [X], "Ora Lassila"}
From the standpoint of an RDF processor, facts (that is, statements) are triples that are members of Statements. Therefore, the original statement remains a fact despite it being reified since the triple representing the original statement remains in Statements. We have merely added four more triples.
The property named "type" is defined to provide primitive typing. The formal definition of type is:
|
Furthermore, the formal specification of reification is:
|
The resource r in the definition above is called the reified statement. When a resource represents a reified statement; that is, it has an RDF:type property with a value of RDF:Statement, then that resource must have exactly one RDF:subject property, one RDF:object property, and one RDF:predicate property.
As described in Section 3, it is frequently necessary to represent a collection of resources or literals; for example to state that a property has an ordered sequence of values. RDF defines three kinds of collections: ordered lists, called Sequences, unordered lists, called Bags, and lists that represent alternatives for the (single) value of a property, called Alternatives.
Formally, these three collection types are defined by:
|
To represent a collection c, create a triple {RDF:type, c, t} where t is one of the three collection types RDF:Seq, RDF:Bag, or RDF:Alt. The remaining triples {RDF:_1, c, r1}, ..., {RDF:_n, c, rn}, ... point to each of the members rn of the collection. For a single collection resource there may be at most one triple whose predicate is any given element of Ord and the elements of Ord must be used in sequence starting with RDF:_1. For resources that are instances of the RDF:Alt collection type, there must be exactly one triple whose predicate is RDF:_1 and that is the default value for the Alternatives resource (that is, there must always be at least one alternative).
[...]
A single resource can be the value of more than one property; that is, it can be the object of more than one statement and therefore pointed to by more than one arc. For example, a single Web page might be shared between several documents and might then be referenced more than once in a "sitemap". Or two different (ordered) sequences of the same resources may be given.
Consider the case of specifying the collected works of an author, sorted once by publication date and sorted again alphabetically by subject:
Figure 13: Sharing values between two sequences
[...]
The RDF data model intrinsically only supports binary relations; that is, a statement specifies a relation between two resources. In the following examples we show the recommended way to represent higher arity relations in RDF using just binary relations. The recommended technique is to use an intermediate resource with additional properties of this resource giving the remaining relations. As an example, consider the subject of one of John Smith's recent articles -- library science. We could use the Dewey Decimal Code for library science to categorize that article. Dewey Decimal codes are far from the only subject categorization scheme, so to hold the classification system relation we identify an additional resource that is used as the value of the subject property and annotate this resource with an additional property that identifies the categorization scheme that was used. As specified in Section 2.3., the RDF core includes a value property to denote the principal value of the main relation. The resulting graph might look like:
Figure 14: A ternary relation
[...]
A common use of this higher-arity capability is when dealing with units of measure. A person's weight is not just a number such as "200", it also includes the unit of measure used. In this case we might be using either pounds or kilograms. We could use a relationship with an additional arc to record the fact that John Smith is a rather strapping gentleman:
Figure 15: Unit of measure as a ternary relation
[...]
[...]
[...]
[...]
[...]
This specification is the work of the W3C RDF Model and Syntax Working Group. This Working Group has been most ably chaired by Eric Miller of the Online Computer Library Center and Bob Schloss of IBM. We thank Eric and Bob for their tireless efforts in keeping the group on track and we especially thank OCLC, IBM, and Nokia for supporting them and us in this endeavor.
The members of the Working Group who helped design this specfication, debate proposals, provide words, proofread numerous drafts and ultimately reach consensus are: Ron Daniel (DATAFUSION), Renato Iannella (DSTC), Tsuyoshi SAKATA (DVL), Murray Maloney (Grif), Bob Schloss (IBM), Naohiko URAMOTO (IBM), Bill Roberts (KnowledgeCite), Arthur van Hoff (Marimba), Charles Frankston (Microsoft), Andrew Layman (Microsoft), Chris McConnell (Microsoft), Jean Paoli (Microsoft), R.V. Guha (Netscape), Ora Lassila (Nokia), Ralph LeVan (OCLC), Eric Miller (OCLC), Charles Wicksteed (Reuters), Misha Wolf (Reuters), Wei Song (SISU), Lauren Wood (SoftQuad), Tim Bray (Textuality), Paul Resnick (University of Michigan), Tim Berners-Lee (W3C), Dan Connolly (W3C), Jim Miller (W3C, emeritus), Ralph Swick (W3C). Dan Brickley (UK Bristol) joined the RDF Schema activity and brought us lots of sage advice in the final stages of this work. Martin Dürst (W3C) reviewed several working drafts and made a number of suggestions for improvement on behalf of the W3C Internationalization Working Group. Janne Saarela (W3C) performed a priceless service by creating a 'clean room' implementation from our working drafts.
This document is the collective work of the Working Group. The editors are indebted to the Working Group for helping to create and polish this specification.
The following terms are used in this specification with varying degrees of intuitive meaning and precise meaning. The summary definitions here are for guidance only; they are non-normative. Where appropriate, the location in the document of the precise definition is given also.
[...]
[...]
[...]
[...]
[...]