An RDF Glossary

Martyn Horner

This is draft 1: MH's first attempts, scan of alternatives and contradictory definitions.

Please suggest further alternative views.


To the working group

The point of this document is to provide a consensus-supported vocabulary for new readers of the tutorial and introductory texts in our document set (and possibly for all readers of W3C documents). This may be incorporated as part of the final document set or offered as an appendix. It is therefore intended to help non-experts.

This `small' project was started during the first month of the Working Group's life and abandoned because of complete lack of consensus in some very fundamental areas such as `resource' and `entity'. The WG's ability to find a contrary viewpoint to any case is not helpful to the reader new to the subject. I am trying here to provide a key to at least the appearance of agreement.

I have two objectives for the final glossary:

  1. to reduce forward references so that a new reader can work through this glossary to build an understanding of the basic terms, and
  2. to separate model and serialization terms (in the spirit of the face-to-face meeting's feelings about the model being the core concept) and the definitions of the RDF model and its associations with the world (the entity, resource, URI thread).

I have listed what I think are the important terms with, in most cases, my attempt at a humane definition and then a selection of sources. Occasionally, I have added a comment of my own.

To promote consensus I propose the following process. Draft 1 of this document will contain my proposed text and all the direct quotations from other sources that I can lay my hands on. My selection of sources is intended to be diverse, authoritative and a mixture of the focussed and the discursive. This is by no means a model for the final document! I intend to remove the sources: this layout just shows where the conflicts may lie. I will put this up for the group to consider and ask for email comments. Members should feel free to suggest more alternatives (!). Comments meant to change existing alternatives will spawn new alternatives. The point is to mix in all the opposing views in one document so that every idea gets its chance.

I will then try and resolve differences and issue draft 2 to ask for final resolution of conflicts. If the group cannot arrive at a resolution then I suggest there is no consensus and I will humbly abandon the project in whole or those parts not agreed on. I don't think I should choose definitions just because I like them if there is substantial disagreement: this would not serve any purpose for the reader.

The final draft will contain just those parts agreed by the Group (with no remaining dissenting comments).

While not wishing to produce a redundant document, I feel that a `tutorial' glossary is useful to help a reader form an internally consistent language before receiving the precise definitions of the specs. It is therefore important not to give the user redolent but false images which may cause him/her to resist the finer descriptions when they are encountered. Definitions here should be `natural' and easily assimilated.

Sources

[RDFT&C]
Graham Klyne's working document before this
Editor:
Graham Klyne
Contributors:
Graham Klyne
Brian McBride
Bill dehOra
Dan Brickley
Pat Hayes
[RDFM&S:introduction]
The RDF Model and Syntax Specification: the initial definitions and discussion.
[RDFM&S:Section5]
The RDF Model and Syntax Specification: section 5 (Formal Model).
[RDFM&S:glossary]
The RDF Model and Syntax Specification: the glossary section.
[RFC2616]
Hypertext Transfer Protocol -- HTTP/1.1. Defines some basic elements but with a network retrieval slant.
[RFC2396]
Uniform Resource Identifiers (URI): Generic Syntax. Defines everything with a view to strong uniformity across the Web: possibly over-simplistic.
[Accessibility]
Web Accessibility Initiative. Huge glossary.
[WebChar]
Web Characterization Terminology & Definitions Sheet. Referenced by [Accessibility] and others.
[Jena]
An Introduction to RDF and the Jena API. Small practical glossary. By our own Brian McBride.
[N3]
N3 primer. Small glossary. `These are not formal definitions - just phrases to help you get the hang of what these things mean.' By Tim BL.
[ScAm]
Tim BL (et al)'s article in Scientific American
[MH]
My additional comments

The text

An RDF glossary

Editor:
Martyn Horner (Profium Sarl)

It is hoped that reading this document before any others in the collection of specifications and discussions which present RDF will allow the reader to build a `bigger picture' of how this particular collection of words and concepts are used.

RDF places specific requirements on some of its core terms and concepts but it is intended to reflect the real world and the Semantic Web in particular. The most technical terms must therefore be understood in the broadest of contexts, that of the World Wide Web.

Almost every word defined here will be redefined more rigorously and more precisely in later documents and those definitions are the `normative' ones. This short dictionary is meant to show how these terms will fit together and forewarn the reader that, while some words will have their natural meanings, others will be used in specialized ways.

The world

Entity
The world (and the world of information that it encloses) contains a vast number of `entities' - things we talk about and think about. Many have names and words in human languages, some have no name but can be referred to in passing.
[RDFT&C] Anything which exists or has existed. Note that RFC2396 uses this term in a more restricted sense, to mean some data represents some aspect of a Web Resource.
[MH] Actually [RFC2396] doesn't attempt to define `entity'.
[MH] [RDFM&S:introduction] uses `entity' as an undefined primitive.
[RFC2616 betraying its need to talk about transport mechanisms] The information transferred as the payload of a request or response.
[MH] Of course `entity' has a meaning in the lexicography of XML which suggests that it cannot be used safely to mean something quite as general as `something'. Perhaps we need…`Thing' as DAML+OIL have it.
Resource
The universe in which RDF operates is seen as a potentially huge collection of `resources'.

Resources are the identifiable items in the world, the contact points between you and the world of data. They are `entities' as we need to refer to them, fixed for a short time while we talk about them.

A typical resource would be a unit of data on the Web such as a page or a significant segment of a page. Equally another person, an organization or anything else that you would wish to point at out there in this universe can be referred to as a `resource'. The significant characteristic is the identifiable nature of resources, that they have for whatever period of time an identity which makes them distinguishable.

[RDFT&C] May refer to an RDF resource or a Web Resource. Some resources may be both. In discussion of RDF, this term is often used to mean RDF Resource.
[RDFM&S:introduction] A resource may be an entire Web page; such as the HTML document "http://www.w3.org/Overview.html" for example. A resource may be a part of a Web page; e.g. a specific HTML or XML element within the document source. A resource may also be a whole collection of pages; e.g. an entire Web site. A resource may also be an object that is not directly accessible via the Web; e.g. a printed book. Resources are always named by URIs plus optional anchor ids (see [URI]). Anything can have a URI; the extensibility of URIs allows the introduction of identifiers for any entity imaginable.
[RDFM&S:glossary] An abstract object that represents either a physical object such as a person or a book or a conceptual object such as a color or the class of things that have colors. Web pages are usually considered to be physical objects, but the distinction between physical and conceptual or abstract objects is not important to RDF. A resource can also be a component of a larger object; for example, a resource can represent a specific person's left hand or a specific paragraph out of a document. As used in this specification, the term resource refers to the whole of an object if the URI does not contain a fragment (anchor) id or to the specific subunit named by the fragment or anchor id.
[Jena] Some entity. It could be a web resource such as web page, or it could be a concrete physical thing such as a tree or a car. It could be an abstract idea such as chess or football. Resources are named by URIs.
[N3] That identified by a Universal Resource Identifier (without a "#"). If the URI starts "http:", then the resource is some form of generic document.
Web Resource
Resources which have their identity by nature of their accessibility on the World Wide Web are sometimes distinguished as `Web Resources'. To make this identification, we may have to chose one aspect of this entity's contact with the Web - for an organization: a particular Web page, for a person: an email account, etc.
[RDFT&C] Anything that is identified by a URI
[Dan Connolly (email)] Nope; that rules out real numbers...
[Accessibility] anything that has identity on the Web. A Web resource is identified by a URI.
[WebChar] A resource, identified by a URI, that is a member of the Web Core (The collection of resources residing on the Internet that can be accessed using any implemented version of HTTP as part of the protocol stack (or its equivalent), either directly or via an intermediary. Notes: By the term "or its equivalent" we consider any version of HTTP that is currently implemented as well as any new standards which may replace HTTP (HTTP-NG, for example). Also, we include any protocol stack including HTTP at any level, for example HTTP running over SSL.).
[RFC2616] A network data object or service that can be identified by a URI, as defined in section 3.2. Resources may be available in multiple representations (e.g. multiple languages, data formats, size, and resolutions) or vary in other ways.
[RFC2396 (in context of defining URI)] A resource can be anything that has identity. Familiar examples include an electronic document, an image, a service (e.g., "today's weather report for Los Angeles"), and a collection of other resources. Not all resources are network "retrievable"; e.g., human beings, corporations, and bound books in a library can also be considered resources. The resource is the conceptual mapping to an entity or set of entities, not necessarily the entity which corresponds to that mapping at any particular instance in time. Thus, a resource can remain constant even when its content---the entities to which it currently corresponds---changes over time, provided that the conceptual mapping is not changed in the process
URI, URL and URN
To identify a resource through a fixed presence on the Web, the standard technique is to use a URI - Uniform Resource Identifier. This is a sequence of characters, a text string, which designates a resource uniquely.

A resource's URI is as different from another resource's URI as one resource is from another.

The actual content of a URI carries no explicit description of the resource although parts of the string describe how that resource may be contacted, downloaded or otherwise viewed. The most obvious examples are `http:...' Web pages which can be fetched with a browser and email addresses which allow unique contact with an individual.

URIs are the physical identifications of resources and connections to them. They are not the resources themselves nor are they the entities identified as resources. Entities have biographies of their own. The role they play in the information world is as resources. We use URIs to connect with them through that information world.

[RFC2396 - obviously definitive] A URI can be further classified as a locator, a name, or both. The term "Uniform Resource Locator" (URL) refers to the subset of URI that identify resources via a representation of their primary access mechanism (e.g., their network "location"), rather than identifying the resource by name or by some other attribute(s) of that resource. The term "Uniform Resource Name" (URN) refers to the subset of URI that are required to remain globally unique and persistent even when the resource ceases to exist or becomes unavailable.
[WebChar] a compact string of characters for identifying an abstract or physical resource.
[N3] The way of identifying anything (including Classes, Properties or individual things of any sort). Not everything has a URI, as you can talk about something by just using its properties. But using a URI allows other documents and systems to easily reuse your information.
Reference, link
Because a URI forms a connection to a resource, we can make a reference to it by using its URI. This is what sits behind a link on a Web page - not a Web resource but its URI which acts as a more-or-less reliable connection to it.

The construction of URIs carries some recognition of the ways resources group together in Web sites or communities. A reference my exploit this by locating a resource only within a pre-determined location. Such a reference is referred to as `relative'. This is strictly a matter of conciseness. Behind each use of a relative reference there must be a procedure to produce an absolute reference before meaning can be attached to the reference.

Relative references until they are resolved into absolute URIs can be used as a local map round a locality on the Web.

Metadata
`Data about data'

Any collection of data specifically supplying information about resources. This naturally takes the form of describing the relationships between resources and their characteristics in well-defined terms.

Metadata takes form as RDF.

[RDFM&S:introduction] Metadata is "data about data" (for example, a library catalog is metadata, since it describes publications) or specifically in the context of this specification "data describing Web resources". The distinction between "data" and "metadata" is not an absolute one; it is a distinction created primarily by a particular application, and many times the same resource will be interpreted in both ways simultaneously.
[RDFT&C] Data describing Web resources.
[Accessibility] Data about data on the Web, including but not limited to authorship, classification, endorsement, policy, distribution terms, IPR, and so on. A significant use for the Semantic Web.

[quoting Barron's Dict. of Computer & Internet Terms] Data that describes data. Data dictionaries and repositories are examples of metadata. The term may also refer to any file or database that holds information about another database's structure, attributes, processing or changes.

RDF
`Resource Description Framework', piecing together a definition from what we now understand, is a way of expressing descriptions of resources, specifically - since it belongs to the information world - about Web resources.
[RDFM&S:introduction] a foundation for processing metadata; it provides interoperability between applications that exchange machine-understandable information on the Web.

The broad goal of RDF is to define a mechanism for describing resources that makes no assumptions about a particular application domain, nor defines (a priori) the semantics of any application domain.

RDF Resource
This term is sometimes used to emphasize the use of a resource from within such a framework but the distinction is not necessary. All resources can potentially be used in metadata.
[RDFM&S:introduction] `1.There is a set called Resources.'
[RDFT&C] Note that an RDF resource is not necessarily a web resource, though any web resource can be an RDF resource.

Consider: http://foo.com/#a and http://foo.com/#b may name distinct RDF resources, but if used to access web resources they both refer to the common web resource http://foo.com/ This distinction between "Web resource" and "RDF Resource" is not a desired outcome, but an interpretation of different uses of the term "resource" in different documents.

RDF Resource Identifier, Resource Identifier
[RDFT&C] A URI plus optional anchor ID. [RDFM&S] RDF Resource Identifiers are understood to name RDF Resources.
[MH] This seems needlessly circular.

RDF Model

Property, value
In RDF, a relationship of one resource to another or a characteristic of a resource is expressed as a `property' of that resource.

Specifically, the word `property' is used for the `relationship' or `characteristic' and the word `value' is used for the the target of the relationship or how this characteristic is expressed.

Thus if resource `A' (the one we are talking about) has the relationship `A is-the-father-of B' then `is-the-father-of' is the property here and `B' (a resource) is the value of this property of `A'. Equally `C is green' suggests a property `has-colour' and the value here `green'.

Properties (in order to be useful) must be identifiable and individual: they are thus declared to be resources (and generally have published URIs). They have all the characteristics of resources. Not all resources are properties, however.

[RDFM&S:introduction] a specific aspect, characteristic, attribute, or relation used to describe a resource. Each property has a specific meaning, defines its permitted values, the types of resources it can describe, and its relationship with other properties. This document does not address how the characteristics of properties are expressed; for such information, refer to the RDF Schema specification).
[RDFM&S:Section5] There is a subset of Resources called Properties.
[RDFM&S:glossary] A specific attribute with defined meaning that may be used to describe other resources. A property plus the value of that property for a specific resource is a statement about that resource. A property may define its permitted values as well as the types of resources that may be described with this property.
[Jena] A property is an attribute of a resource. For example DC.title is a property, as is RDF.type.
[N3] A sort of relationship between two things; a binary relation.
Literal
When expressing a relationship between resources, a piece of metadata will contain two references to resources. To express a characteristic, it will contain a value which is a `literal'. This is defined to be a piece of text and (until we tackle the problem of expressing metadata as text) there are no restrictions on what this text should be.

To carry information however, this text is understood to be a legitimate value for the accompanying property.

[RDFM&S:introduction] a simple string or other primitive data-type defined by XML. In RDF terms, a literal may have content that is XML markup but is not further evaluated by the RDF processor
[RDFM&S:Section5] (any well-formed XML)
[RDFM&S:glossary] The most primitive value type represented in RDF, typically a string of characters. The content of a literal is not interpreted by RDF itself and may contain additional XML markup. Literals are distinguished from Resources in that the RDF model does not permit literals to be the subject of a statement
[Jena] A string of characters which can be the value of a property.
[MH] It would be a forward reference to talk about the literal being interpreted (or not) as XML.
Referent, subject
For each piece of metadata, there is one resource which is being described. This is the `referent' or `subject' of this piece of metadata.
[RDFM&S:introduction] The object being described (in the XML syntax indicated by the about attribute) is in RDF called the referent.
[RDFT&C] The entity or concept that an RDF Resource describes. [RDFM&S]
[MH] This `definition' (of `referent') is made in [RDFM&S:introduction] but the word is hardly ever used… by anyone: is it helpful? I think not. (`Subject' is essential, of course.)
RDF Statement, Statement
A single piece of metadata asserting one `fact' about one resource is called a `statement'. `Statement' is a word with many connotations; so specifically this is an `RDF statement'.
[RDFM&S:Section5] There is a set called Statements, each element of which is a triple of the form
{pred, sub, obj}
Where pred is a property (member of Properties), sub is a resource (member of Resources), and obj is either a resource or a literal (member of Literals).
[RDFM&S:glossary] An expression following a specified grammar that names a specific resource, a specific property (attribute), and gives the value of that property for that resource. More specifically here, an RDF statement is a statement using the RDF/XML grammar specified in this document.
[Jena] An arc in an RDF graph, normally interpreted as a fact.
[N3] A subject, predicate and object which assert meaning defined by the particular predicate used.
Triple
To fix the structure of an RDF statement, it is usually referred to as a `triple' having always three parts: a referent resource, a property and a property value. These are also referred to as the statement's subject, predicate and object.
[RDFM&S:glossary] A representation of a statement used by RDF, consisting of just the property, the resource identifier, and the property value in that order.
[Jena] A structure containing a subject, a predicate and an object. Another term for a statement.
Anonymous resource
Within a set of RDF statements, reference may be made to a resource without explicitly connecting with it via a URI.

This only makes sense if such a resource plays at least two roles within a connected set of statements (say as an object in one triple and a subject of another). It carries its identity between two references without any requirement to declare its address (or its accessibility) in the Web.

The anonymity of such a resource becomes apparent chiefly when the triples involved are expressed as text and some means must be found to refer to it without a URI.

RDF Graph
A collection of triples is often called a Graph.

A single triple is often expressed `graphically' as two figures connected by an `arc' a curved line to show the relationship. A collection of these connected figures represents the mathematical concept of `graph'.

[RDFT&C] A set of RDF Statements.
[MH] Note that nobody seems to want to include this in a glossary
RDF Document
A natural unit of RDF, usually identified with a physical document.

The concept of a document is important since it defines the limits of validity of certain local devices in the expression of RDF such as anonymous resources.

[MH] Note that nobody seems to want to include this in a glossary
Model
[RDFT&C] [See RDFM&S section 5]. This term is used in three distinct ways:
  1. The RDF Model, meaning the underlying structure and interpretation of RDF data
  2. An RDF Model, meaning an instance of a collection of RDF statements
  3. Logical Model, being a formal logicians' term with quite specific meaning. (see http://www-rci.rutgers.edu/~cfs/305_html/Deduction/FormalSystemDefs.html).

(This term has caused some confusion, since it has a quite specific meaning to logicians, which is not the same as some would regard as its "natural" meaning.)

[MH] Not so much a vocabulary item as a place-holder for other concepts. Over-used: stress this. For example: Document Object Model.
Container
In (TBD: certain classes of) RDF, resources may be grouped together in `containers' to express their roles as members of collections sharing a semantic relationship. Thus the `object' of a triple may be a `container' which implies that the represented statement applies (in some way) to the members of the collection or to the collection as a whole.

The distinction may be apparent in the interpretation of the RDF or may be inherent in the meaning of the property part of the triple.

Each container is labelled and defined as being one of:

bag
The members of the collection all have an equal role
seq
The members play their roles sequentially
alt
Only one member is indicated to have this role - but the container represents the possible alternatives. The choice of a particular member is made as part of the interpretation of the RDF. For example in a test of the truth of this statement, it suffices to prove that it is true for one member of an `alt' container.
[MH] Not defined elsewhere
Vocabulary
In its specialized use for RDF, this refers to the use of a set of properties as a fixed reference with fixed meanings recognized by other users of RDF.
[MH] Not defined elsewhere
RDF Schema
An RDF schema is a document expressing the semantic information behind a vocabulary (a collection of property resources) and fixing certain rules for the use of the properties in RDF documents.

To exploit this semantic information, an RDF document will be associated with a schema and this schema will be shared by all documents which require a common set of semantics.

Typically a schema is defined for a particular community or industry which requires a consistently interpreted standard for exchanged RDF.

One purpose of a schema is to define what sort of literal is a legitimate value for a particular property although it does not explain how to interpret the literal.

[Accessibility] An RDF schema denotes resources which constitute the particular unchanging versions of an RDF vocabulary at any point in time. It is used to provide semantic information (such as organization and relationship) about the interpretation of the statements in an RDF data model. It does not include the values associated with the attributes.
Ontology
A more general word to describe a complete description of the semantic relationships between resources described by RDF within a certain field of endeavour. An `ontology' contains a model of the categories of entities in this field, their possible relationships as well as the relationships of internal properties.

Once a subject's ontology has been devised and agreed upon, semantic information can be exchanged between interested parties reliably and rigorously.

[Accessibility] An ontology in RDF and Artificial Intelligence infers a document or file that formally defines the relations among terms. Ontologies establish a joint terminology between members of a community of interest. These members can be human or automated agents. The most typical kind of ontology for the Web has a taxonomy and a set of inference rules.
[ScAm] a document or file that formally defines the relations among terms

RDF serializations

Serialization
RDF documents require to be `serialized' in some form before they can be exchanged over wires and through the air.

Typically, because so much of the Web is oriented towards text transmission, this will be as characters and, because so much of the world of computers is biased towards its Western origins, this will be in the most restricted set of characters.

As a side effect, RDF may appear readable to various degrees by ordinary humans.

The major form of serialization is in XML form. This is an example of a triple in XML form:


<rdf:description rdf:about="http://www.profium.com/">
    <a:location>Sophia-Antipolis</a:location>
</rdf:description>

However, this is not the only form. More readable forms exist such as N3 or N-Triples.

A serialization should promote the same interpretation as any other and any programs charged with translating to or from any serialization style should result in the same semantic information. Note that how anonymous resources are expressed is theoretically immaterial since they have no relation to anything outside of their document.

Reification

Reification (of a statement)
An RDF statement is an entity in our information universe. As such one can hope to be able to assert facts about it. To do such a thing, one needs to treat the statement as a resource: this is called reification of that statement.
[RDFT&C] [See RDFM&S section 5] A resource that stands for the statement together with the four statements that describe the statement. More than one reification may exist for a given statement. (There is some debate whether multiple reifications of a statement are necessarily equivalent.)
Reified statement
When a statement is reified in RDF, four statements are created in its place which give the four fundamental properties of the original: that it is a statement and that its subject, predicate and objects are what they are.
[RDFT&C] [See RDFM&S section 5] A resource that stands for a statement in a Reification. This resource has four properties describing the statement, and maybe others.
[RDFM&S:introduction] A new resource with the above four properties represents the original statement and can both be used as the object of other statements and have additional statements made about it. The resource with these four properties is not a replacement for the original statement, it is a model of the statement. A statement and its corresponding Reified statement exist independently in an RDF graph and either may be present without the other. The RDF graph is said to contain the fact given in the statement if and only if the statement is present in the graph, irrespective of whether the corresponding reified statement is present
[RDFM&S:Section5] Reification of a triple {pred, sub, obj} of [the set of] Statements is an element r of [the set of] Resources representing the reified triple and the elements s1, s2, s3, and s4 of [the set of] Statements such that
       s1: {RDF:predicate, r, pred} 
       s2: {RDF:subject, r, subj} 
       s3: {RDF:object, r, obj} 
       s4: {RDF:type, r, [RDF:Statement]}
Reification of an RDF Graph
An entire RDF graph (the collection of triples representing a set of RDF statements) can reified as a collection of reified statements.
[RDFT&C] A (bag/collection?) containing the reifications of the statements in an RDF Graph