W3C

RDF Model Theory

W3C Working Draft 14 December 2001

This version:
@@@@@
Latest version:
http://www.w3.org/TR/rdf-mt/
Previous version:
http://www.w3.org/TR/2001/WD-rdf-mt-20010925/
Editor:
Patrick Hayes, IHMC, University of West Florida < phayes@ai.uwf.edu>

Abstract

This is a specification of a model-theoretic semantics for RDF and RDFS, and some basic results on entailment. It does not cover reification or special meanings associated with the use of RDF containers. This document was written with the intention of providing a precise semantic theory for RDF and RDFS, and to sharpen the notions of consequence and inference in RDF. It reflects the current understanding of the RDF Core working group at the time of writing. In some particulars this differs from the account given in Resource Description Framework (RDF) Model and Syntax Specification, and these exceptions are noted.

This version of the specification has some substantial changes from earlier versions, as well as correcting several errors.

Status of this Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. The latest status of this document series is maintained at the W3C.

This work is part of the W3C Semantic Web Activity. It has been produced by the RDF Core Working Group which is chartered to address a list of issues raised since RDF 1.0 was issued. This document reflects the Working Group's recent deliberation of some of these issues, but the Working Group has not yet decided whether or how to integrate this document with the RDF 1.0 specification ultimately.

This document is a W3C Working Draft for review by W3C members and other interested parties. It is a draft document and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use W3C Working Drafts as reference material or to cite them as other than "work in progress". A list of current public W3C Working Drafts can be found as part of the W3C Technical Reports and Publications.

Please address feedback on this document to www-rdf-comments@w3.org, a mailing list with public archive). The editors and the Working Group plan to address feedback in future revisons of this document.

Table of Contents//revise

0. Introduction
  0.1 Model-theoretic semantics
  0.2 Graph Syntax
1. Interpretations
  1.1 Technical note
  1.2 Vocabularies (urirefs, resources and literals)
  1.3 Interpretations of ground graphs
  1.4 Example
  1.5.Unlabeled nodes as existential assertions
  1.6 Comparison with logic
2. Simple Entailment
  2.1 Skolemization
  2.2 Merging RDF graphs
3. RDF interpretations
4. RDF-entailment and RDF closures
5. RDFS interpretations
6. RDFS-entailment and RDFS closures
  6.1 A note on rdfs:Literal
7. RDF Containers
Appendix A. Summary
Appendix B. Acknowledgements
References
Change Log

0. Introduction

0.1 Model-theoretic semantics

A model-theoretic semantics for a language assumes that the language refers to a 'world', and describes the minimal conditions that a world must satisfy in order to assign an appropriate meaning for every expression in the language. A particular world is called an interpretation, so that model theory might be better called 'interpretation theory'. The idea is to provide an abstract, mathematical account of the properties that any such interpretation must have, making as few assumptions as possible about its actual nature or intrinsic structure. Model theory tries to be metaphysically and ontologically neutral. It is typically couched in the language of set theory simply because that is the normal language of mathematics -for example, this semantics assumes that names denote things in a set IR called the 'universe' - but the use of set-theoretic language here is not supposed to imply that the things in the universe are set-theoretic in nature.

The chief utility of such a semantic theory is not to suggest any particular processing model, or to provide any deep analysis of the nature of the things being described by the language (in our case, the nature of resources), but rather to provide a technical tool to analyze the semantic properties of proposed operations on the language; in particular, to provide a way to determine when they are valid, ie they preserve meaning.

This document describes a model theory for RDF(S) which treats the language as simple assertional language, in which each triple makes a distinct assertion and the meaning of any triple is not changed by adding other triples. This imposes a fairly strict monotonic discipline on the language, so that it cannot express closed world assumptions, local default preferences, etc., and several other commonly-used non-monotonic constructs. There are several aspects of meaning in RDF which are ignored by this semantics. It treats URIs as simple names, ignoring aspects of meaning encoded in particular URI forms [RFC 2396]. It does not provide any analysis of time-varying data or of changes to URI denotations. It gives only a rudimentary treatment of literals, ignoring datatyping. It provides only a simple account of RDF containers; and it ignores reification. Some of these may be covered by future extensions of the model theory.

0.2 Graph syntax

Any semantic theory must be attached to a syntax. Of the several syntactic forms for RDF, we have chosen the RDF graph as described in [RDFMS] as the primary syntax, largely for its simplicity. We understand linear RDF notations such as N-Triples and rdf/xml [RDF/XML] as lexical notations for specifying RDF graphs. (There are graphs that cannot be described by these notations, however.) Two RDF documents, in whatever lexical form, are syntactically equivalent if and only if they map to the same RDF graph. The model theory assigns interpretations directly to the graph; we will refer to this as the 'graph syntax' to avoid ambiguity, since the bare term 'syntax' is often assumed to refer to a lexicalization.

An RDF graph can be defined in terms of labeled nodes and arcs (see Appendix A), but we will use an equivalent but more convenient definition, in which a graph is defined to be a set of triples of the form <S, P, O>, where P is a URI reference (in the sense of [RFC 2396]), which we will call auriref, S is either a uriref or a blank node, and O is either a uriref, a blank node, or a literal. Blank nodes are considered to be drawn from some set of 'anonymous' entities which have no 'label' and are unique to the graph. The nodes of the graph corresponding to a set of triples can then be defined as the urirefs and blank nodes in the S and O positions of the triples of the set, together with the set of all distinct occurrences of literals in the set. (Exact descriptions of the correspondences between this conception of an RDF graph and others are given in Appendix A.)

(This way of describing RDF graphs simplifies the exposition of the model theory in several ways, particularly by not requiring us to distinguish between graph nodes and their labels. It has the elegant consequence that the result of merging several graphs is simply the union of the set of triples comprising each of the graphs separately. Notice that disjoint graphs do not have any blank nodes in common, by definition, and that each separate occurrence of a literal is considered a separate node (in contrast to urirefs); we will therefore distinguish between literals and literal nodes.)

To indicate blank nodes in the triples of a graph we will use the nodeID convention used in the N-triples syntax described in [RDFTestCases]. (However, we will use letters or short letter sequences to stand in place of urirefs, in the interests of brevity.) Note that while these node identifiers (formerly called bNodes) identify blank nodes in the surface syntax, these expressions are not considered to be the label of the graph node they identify. In particular, two N-triples documents which differ only by re-namingtheir node identifiers will be understood to describe the same RDF graph. This means that one may not, in general, obtain the correct Ntriples description of a merged graph simply by forming the merge of the corresponding Ntriples documents which describe the original graphs, since the same nodeID may have been used in more than one of the documents. To merge Ntriples documents it is necessary to check if the same nodeID is used in two documents, and to replace it with a distinct nodeID in one of them, before merging the documents. Node identifiers of blank nodes play a role in an ntripleDoc analogous to that played by bound variables in a logical expression. They are local to the document and serve only to indicate a 'connection' between other expressions.

Other RDF lexicalizations may use other means of indicating the graph structure; for our purposes, the important syntactic property of RDF graphs is that each distinct item in an RDF graph is treated as a distinct referring entity in the graph syntax.

An RDF graph will be said to be ground if it has no unlabeled nodes. The vocabulary of a graph is the set of urirefs that it contains.

1. Interpretations

1.1 Technical Note

We assume that there is no restriction on the domains and ranges of properties; in particular, a property may be applied to itself. When classes are introduced in RDFS, we will assume that they can contain themselves. This might seem to violate one of the axioms of standard (Zermelo-Fraenkel) set theory, the axiom of foundation, which forbids infinitely descending chains of membership. However, the semantic model given here distinguishes properties and classes as objects from their extensions - the sets of object-value pairs which satisfy the property, or things that are 'in' the class - thereby allowing the extension of a property or class to contain the property or class itself without violating the axiom of foundation. In particular, this use of a class extension mapping allows classes to contain themselves. For example, it is quite OK for (the extension of) a 'universal' class to contain the class itself as a member, a convention that is often adopted at the top of a classification hierarchy. (If an extension contained itself then the axiom would be violated, but that case never arises.) The technique is described more fully in [Hayes&Menzel], which gives a semantics for an infinitary extension of full first-order logic.

Notice that the question of whether or not a class contains itself as a member is quite different from the question of whether or not it is a subclass of itself.

In what follows, the fact that two sets are given different names should not be taken to imply that they are disjoint; we will explicitly state any disjointness or containment conditions as they arise. In the same spirit, the fact that one set is stated to be a subset of another should not be interpreted as saying that these sets cannot be identical, unless this is stated explicitly.

1.2 Vocabularies (urirefs, resources and literals)

RDF uses two kinds of referring expression, urirefs and literals. We make very simple and basic assumptions about these. Urirefs are treated as logical constants, i.e. as names which denote resources; no assumptions are made about the nature of resources. In this version of the model theory, literals are treated as simple names which refer to literal values, and no particular assumptions are made about the role of datatype specifications. This is not intended to be a definitive treatment of literals, and will be replaced by a more thorough treatment in later versions of the model theory.

An interpretation assigns meanings to symbols in a particular vocabulary of urirefs. Some interpretations may assign special meanings to the symbols in a particular namespace, which we will call a reserved vocabulary.We will use two reserved vocabularies in this document, defined using the Qname syntax with the prefixes rdf: and rdfs: as follows:

Prefix rdf: namespace URI: http://www.w3.org/1999/02/22-rdf-syntax-ns#

Prefix rdfs: namespace URI: http://www.w3.org/2000/01/rdf-schema#

Interpretations which share the special meaning of a particular reserved vocabulary will be named for that vocabulary, so that we will speak of 'rdf-interpretations' and 'rdfs-interpretations', etc.. An interpretation with no reserved vocabulary will be called a simple interpretation, or simply an interpretation.

We do not take any position here on the way that node labels may be composed from other expressions, e.g. from relative URIs or Qnames; the model theory simply assumes that such lexical issues have been resolved in some way that is globally coherent, so that a single uriref can be taken to have the same meaning wherever it occurs. Similarly, the model theory given here has no special provision for tracking temporal changes; it assumes, implicitly, that urirefs have the same meaning whenever they occur. (If one wishes to apply RDF in a context where the referents of urirefs (or names more generally) may change with time, then the current theory could be regarded as defining a 'snapshot' of the meaning of a changing representation.)

Notice that a vocabulary consists entirely of urirefs; literals are assigned values by some external mechanism.The current model theory tries to be as noncommittal as possible about the meaning of literals. The theory makes no assumptions about the exact nature of literals (other than that they can be distinguished from urirefs) or of literal values.In particular it is agnostic on the question of whether or not literal values are considered to be resources. We simply assume a global set LV of literal values and a global mapping XL from literal nodes to LV. All interpretations will be required to conform to XL on literal nodes. This leaves open the question of the exact nature of literals, their language-sensitivity and so on, while acknowledging their special status as expressions with a 'fixed' interpretation.(At the time of writing, various alternative techniques for handling literal meanings and relating them to datyping schemes are under discussion, which would use different methods for specifying the XL mapping. The particular scheme chosen will be described in a later version of this document.)

1.3 Interpretations of ground graphs

All interpretations will be relative to a set of urirefs, called the vocabulary of the interpretation, so that one has to speak, strictly, of an interpretation of an RDF vocabulary, rather than of RDF itself. (For a lexicalized version of the language, we can think of the vocabulary of an interpretation, more traditionally, as being a subset of the URI-indicating expressions used by that lexicalization.)

A simple interpretation I on the vocabulary V is defined by:

1. A non-empty set IR of resources, called the domain or universe of I.

2. A mapping IEXT from IR into the powerset of IRx(IR union LV) (i.e. the set of sets of pairs <x,y> with x in IR and y in IR or LV)

3. A mapping IS from V into IR

IEXT(x) is a set of pairs, i.e. a binary relational extension, called the extension of x. This trick of distinguishing a relation as an object from its relational extension allows a property to occur in its own extension, as noted earlier. Note that no particular relationship is assumed between IR and LV.

It is convenient to define IP to be the subset of IR with a nonempty extension; intuitively, IP is the set of properties.

The denotation of a ground RDF graph in I is then given recursively by the following rules, which extend the interpretation mapping I from labels to graphs. These rules (and extensions of them given later) work by defining the denotation of any piece of RDF syntax E in terms of the denotations of the immediate syntactic constitutents of E, hence allowing the denotation of any piece of RDF to be determined by a kind of syntactic recursion.

if E is a literal node then I(E) = XL(E)
if E is a uriref then I(E) = IS(E)
if E is an asserted triple <s, p, o> then I(E) = true if <I(s),I(o)> is in IEXT(I(p)), otherwise I(E)= false.
if E is a ground RDF graph then I(E) = false if I(E') = false for some asserted triple E' in E, otherwise I(E) =true.

 

The use of the phrase "asserted triple" is a deliberate weasel-worded artifact, to allow an RDF graph or document to contain triples which are being used for some non-assertional purpose. Strict conformity to the RDF 1.0 specification [RDFMS] assumes that all triples in a document are asserted triples, but making the distinction allows RDF parsers and inference engines to conform to the RDF syntax and to respect the RDF model theory without necessarily being fully committed to it. RDF as presently defined provides no syntactic means to distinguish asserted from nonasserted triples, however, so the distinction can be safely ignored in the remainder of the document, which assumes that all triples in a graph are asserted.(To apply the subsequent results to RDF containing unasserted triples, it would be necessary to restrict the definitions to the sets of asserted triples in the graphs.)

1.4 Example

For example, for the vocabulary {a, b, c} (where these symbols should be understood to stand for urirefs) the following is a small interpretation, where we use integers to indicate the 'things' in the universe. (Note; this use of integers is not meant to imply that RDF interpretations should be interpreted as being about arithmetic, but more to emphasize that the exact nature of the things in the universe is simply left unspecified.)

IR= {1, 2}; IP = {1}

IEXT: 1->{<1,2>,<2,1>}

IS: a->1, b->1, c->2

A drawing of the domains and mappings described in the text
Figure 1: An example of an interpretation. Note, this is not a picture of an RDF graph.

This interpretation makes these triples true:

  a b c

  c a a

  c b a

(for example, I(<a b c>) = true if <I(a),I(c)> is in IEXT(I(b)), i.e. if <1,2> is in IEXT(1), which is {<1,2>,<2,1>} and so does contain <1,2> and so I(<a b c>) is true)

and these triples false:

  a c b

  a b b

  c a c

(for example, I(<a c b>) = true if <I(a),I(b)>, i.e.<1,2>, is in IEXT(I(c)); but I(c)=2 and IEXT is not defined on 2, so the condition fails and I(<a c b>) = false.)

Since the property extensions do not contain any literal values, all triples containing a literal would be false in this interpretation.

To emphasize the basic idea of model theory: this is only one possible interpretation of this vocabulary; there are (infinitely) many others. For example, if we modified this interpretation by attaching the property extension to 2 instead of 1, only one of the above six triples would be true.

1.5. Unlabeled nodes as existential assertions

We could treat unlabeled nodes exactly like urirefs, semantically speaking, by extending the IS mapping to include them as well as urirefs. That would amount to adopting the view that an unlabeled node is equivalent to a node with an unknown label. However, it seems to be simpler, and more in conformance with [RDFMS], to treat unlabeled nodes as simply indicating the existence of a thing without assuming that it has a fictitious name. (This decision can be defended on both philosophical and pragmatic grounds.See http://www.w3.org/2000/03/rdf-tracking/#rdfms-identity-anon-resources for a summary and pointers to the extended discussions.)

This will require some definitions, as the theory so far provides no meaning for unlabeled nodes. Suppose I is an interpretation and A is a mapping from some set of unlabeled nodes to the universe IR of I, and define I+A to be an extended interpretation which is like I except that it uses A to give the interpretation of unlabeled nodes. Define anon(E) to be the set of unlabeled nodes in E. Then we can extend the above rules to include the two new cases that are introduced when unlabeled nodes occur in the graph:

If E is an unlabeled node then [I+A](E) = A(E)
If E is an RDF graph then I(E) = true if [I+A'](E) = true for some mapping A' from anon(E) to IR, otherwise I(E)= false.

 

Notice that we have not changed the definition of an interpretation. The same interpretation that provides a truth-value for ground graphs also assigns truth-values to graphs with unlabeled nodes, even though it provides no interpretation for the unlabeled nodes themselves. Notice also that the unlabeled nodes themselves are perfectly well-defined entities with a robust notion of identity; they differ from other nodes only in not being assigned a direct model-theoretic interpretation, which means that they have no 'global' meaning (i.e. outside the graph in which they occur).

This effectively treats all unlabeled nodes as having the same meaning as existentially quantified variables in the RDF graph in which they occur. Notice however that that since two blank nodes cannot have the same label, there is no need to specify the scope of the quantifier within a graph, and no need to use any explicit quantifier syntax.( If we were to apply the semantics directly to N-triples syntax, we would need to indicate the quantifier scope, since in this lexicalization syntax the same node identifier may occur several times corresponding to a single blank node in the graph. The above rule amounts to the convention that would place the quantifiers just outside, i.e. at the outer edge of, the N-triple document corresponding to the graph.)

For example, with this convention, the graph defined by the following triples is false in the interpretation I shown above in figure 1:

  _:x a b

  c b _:x

since if A' maps the unlabeled node to 1 then the first triple is false in [I+A'], and if it maps it to 2 then the second triple is false.(Note that each of these triples, when taken as a single graph, is true in I, but their conjunction is not. Notice also that if a different node ID were used in the two triples, indicating that the RDF graph had two blank nodes instead of one, then A' could map one node to 2 and the other to 1, and the resulting graph would be true under the interpretation I.)

Notice that if the vocabulary of an RDF graph E contains urirefs that are not in the vocabulary of an interpretation I - that is, if I simply does not give a semantic value to some name that is used in E - then the RDF truth-conditions will always yield the value 'false' for some triple in E.

1.6 Comparison with formal logic

With this semantics, it is simple to translate an RDF graph into a logical expression with essentially the same meaning, as several other authors have noted previously. [Marchiori&Saarela],[Fikes&McGuinness].

Each node labeled with a uriref translates to a logical constant which is its label; each unlabeled node to a distinct variable, and each literal to itself. An arc labeled with p from a node n1 to a node n2 maps to an atomic assertion that the relation p holds true between the expressions s and o gotten by translating n1 and n2 respectively (written as (p s o) in KIF syntax); and finally, an RDF graph translates to the existential closure of the conjunction of the translations of all the arcs in the graph.This requires us to introduce bound variables to correspond to the blank nodes of the graph, similarly to the use of nodeIDs in the N-Triples syntax.

For example, the graph defined by the N-triple document in the above example translates to the logical expression (written in the extended KIF syntax defined in [Hayes&Menzel])

(exists (?y)(and (a ?y b)(b c ?y)))

This translation maps the model theory exactly. Notice however that the resulting expression may contain the same symbol in both relation and object positions (e.g. 'b' in this example), which is considered syntactically illegal in many versions of logic.To map to a more conventional logical syntax one can use a 'dummy' ternary relation symbol to assert that a binary relation holds between two arguments. For example,[Fikes&McGuinness] translate the RDF triple

  s p o

into the KIF expression

  (PropertyValue p s o)

The above example would then map to

(exists (?y)(and (PropertyValue a ?y b)(PropertyValue b c ?y)))

Under this translation, to obtain the appropriate KIF interpretation one has to interpret (PropertyValue a b c) to mean ((IEXT a) b c), and also somehow ensure that literals are interpreted appropriately in the logical semantics. This may require some external semantic constraint to be imposed on the normal logical meaning.

2. Simple entailment between RDF graphs.

Following conventional terminology, we say that I satisfies E if I(E)=true, and that a set S of expressions (simply) entails E if every interpretation which satisfies every member of S also satisfies E. If {E} entails E' we will say that E entails E'. (In later sections these notions will be adapted to classes of interpretations with particular reserved vocabularies, but throughout this section entailment should be interpreted as simple RDF entailment.)

Any process or technique which constructs a graph E from some other graphs S is said to be (simply) valid if S always entails E, otherwise invalid. (The chief utility of a model-theoretic semantics is that it provides a way to make this notion of validity precise. Note that being an invalid process does not mean that the conclusion is false, and being valid does not guarantee truth. However, validity represents the best guarantee that any assertional language can offer: if given true inputs, it will never draw a false conclusion from them.)

In this section we give a few basic results about simple entailment and valid inference processes. (Proofs, all of which are straightforward, are omitted. Note, these results apply only to simple entailment.) Since the language is so simple, simple entailment can be recognized by relatively simple syntactic comparisons. The two basic forms of simply valid proof step in RDF are, in logical terms, the inference from (P and Q) to P, and the inference from (foo baz) to (exists (?x) (foo ?x)) .

Conjunction Lemma.If E is ground, then I satisfies E iff it satisfies every triple in E.

I.e. a ground graph is equivalent in meaning to the logical conjunction of its component triples.

To give a syntactic characterization of entailment we will need to define some relationships between RDF graphs. If E is an RDF graph, say that E' is a subgraph of E when every node and arc in E' are also in E (with the same labels). This corresponds to taking a subset of the triples constituting the graph. Obviously any subgraph of a tidy graph is tidy.

The following is an immediate consequence of the conjunction lemma:

Subgraph Lemma. If E and E' are ground, then E entails E' iff E' is a subgraph of E.

The following is proven by a (simple version) of the technique used to prove Herbrand's theorem in first-order logic, hence the name:

Herbrand Lemma. Any RDF graph has a satisfying interpretation.

This means that there is no such thing as an inconsistency or a contradiction in RDF, which is not surprising since the language does not contain negation. In fact, a stronger result, from which many of the subsequent results can be derived, is true:

Strong Herbrand Lemma. Any RDF graph has an interpretation which satisfies the graph and does not satisfy any ground triple not in the graph.

This emphasizes the extent to which ground RDF triples are independent of each other; contrast this situation with a logic in which implications can be expressed. We will see later that the strong Herbrand lemma fails in almost any extension to 'basic' RDF.

2.1 Skolemization

Skolemization is a syntactic transformation routinely used in automatic inference systems in which existential variables are replaced by 'new' functions applied to universal variables. While not itself strictly a valid operation, skolemization adds no new content to an expression, in the sense that a skolemized expression has the same entailments as the original expression provided they do not contain the new skolem functions.

In RDF, skolemization simplifies to the special case where an existential variable is replaced by a 'new' constant name, i.e. a uriref which is guaranteed to not occur anywhere else. The following lemma shows that skolemization has the same properties in RDF as it has in conventional logics.

Say that E' is an instance of E, with respect to a set V of urirefs, when E' is like E except that some of the unlabeled nodes in E have been labeled with members of V. Obviously if E is ground then it is its only instance. Notice that an instance may be untidy; a tidy instance is gotten by tidying - that is, identifying nodes with the same uriref label - an instance. A tidy ground instance of E with respect to any set which is disjoint from vocab(E) is called a skolemization of E.

Skolemization Lemma. Suppose sk(E) is a skolemization of E with respect to V. Then sk(E) entails E; and if sk(E) entails F and vocab(F) is disjoint from V, then E entails F .

2.2 Merging RDF graphs

Suppose S is a set of RDF graphs, then their merge is the union of the sets of triples in all the graphs in S. Notice that one does not, in general, obtain the merge of a set of graphs by concatenating their corresponding N-triples documents and constructing the graph described by the merged document, since if some of the documents use the same node identifiers, the merged document will describe a graph in which some of the nodes have been 'accidentally' merged.

Merging lemma. The merge of a set S is entailed by S, and entails every member of S.

Notice that unlabeled nodes are not identified with other nodes in the merge, and indeed this reflects a basic principle of RDF graph inference: nodes with the same uriref must be identified, but unlabeled nodes must not be identified with other nodes or re-labeled with urirefs, in order to ensure that the resulting graph is entailed by what one starts with. This is made precise in the following two lemmas (which follow directly from the strong Herbrand lemma) :

Anonymity lemma 1. Suppose E' is like E except that at least one unlabeled node in E is labeled with a uriref in E'. Then E does not entail E'.

Anonymity lemma 2. Suppose that E' is like E except that two distinct unlabeled nodes in E have been identified in E'. Then E does not entail E'.

This means that there is no valid RDF inference process which can produce an RDF graph in which a single unlabeled node occurs in triples originating from several different graphs. (Of course, such a graph can be constructed, but it will not be entailed by the original documents. It must reflect the addition of new information about the identity of two unlabeled nodes. Such information can be represented in many other ways, e.g. in DAML+OIL, but it cannot be represented in RDF itself.)

The main result for simple RDF inference is:

Interpolation Lemma. S entails E iff a subgraph of the merge of S is an instance of E.

The interpolation lemma completely characterizes simple RDF entailment in syntactic terms. To tell whether a set of RDF graphs entails another, find a subgraph of their merge and replace urirefs by unlabeled nodes to get the second. Of course, there is no need to actually construct the merge. If working backwards from the consequent E (the graph that may be entailed by the others), the most efficient technique would be to treat unlabeled nodes as variables in a process of subgraph-matching, allowing them to bind to 'matching' uriref labels in the antecedent graph(s) in S, i.e. those which may entail the consequent graph. The interpolation lemma shows that this process is valid, and is also complete if the subgraph-matching algorithm is. The existence of complete subgraph-checking algorithms also shows that RDF is decidable, i.e. there is a terminating algorithm which will determine for any finite set S and any expression E, whether or not S entails E.

Notice however that such a variable-binding process would only be appropriate when applied to the conclusion of a proposed entailment. This corresponds to using the document as a goal or a query, in contrast to asserting it, i.e. claiming it to be true. If an RDF document is asserted, then it would be invalid to bind new values to any of its unlabeled nodes, since (by the anonymity lemmas) the resulting graph would not be entailed by the assertion.

It might be thought that the operation of changing a bound variable would be an example of an inference which was valid but not covered by the interpolation lemma, e.g. the inference of

_:x foo baz

from

_:y foo baz

Recall however that by our conventions, these two expressions describe the same RDF graph, and any graph is both a subgraph and an instance of itself.

3. RDF Interpretations

We now consider interpretations which impose some extra semantic constraints on the following (rather small) reserved vocabulary, which we will call rdfV:

RDF reserved vocabulary
  rdf:type   rdf:Property

 

(As noted earlier, this version of the model theory assigns no special meaning to the part of the RDF vocabulary concerned with reification. The RDF container vocabulary is considered briefly in section 7.)

An rdf-interpretation of a vocabulary V is an interpretation I on (V union rdfV) which satisfies the following extra conditions:

IP contains    I(rdf:type)
IEXT(I(rdf:type)) contains <x, I(rdf:Property)> iff  x is in IP

This forces every rdf interpretation to contain a thing which can be interpreted as the 'type' of properties. (The second condition could be regarded as defining IP to be the set of resources in the universe of the interpretation which have the value I(rdf:Property) of the property I(rdf:type). This way of construing subsets of the universe will be central in interpretations of RDFS.)

For example, the following rdf-interpretation extends the simple interpretation in figure 1:

IR = {1, 2, T }; IP = {1, T}

IEXT: 1->{<1,2>,<2,1>}, T->{<1,P>,<T,P>}

IS: a->1, b->1, c->2, rdf:type->T, rdf:Property->P

A drawing of the domains and mappings described in the text
Figure 2: An example of an rdf-interpretation.

Note, this is not the smallest rdf-interpretation which extends the earlier example, since we could have made I(rdf:Property) be 2 and IEXT(T) be {<1,2>,<T,2>}, and managed without having P in the universe. In general, a given entity in an interpretation may play several 'roles' at the same time, as long as this can be done without violating any of the required semantic conditions.

4. Rdf-entailment and rdf closures

Following the definitions in section 2, we will say that S rdf-entails E when every rdf interpretation which satisfies every member of S also satisfies E. This is an example of vocabulary entailment , i.e. entailment relative to a set of interpretations which satisfy extra semantic conditions on a reserved vocabulary.

vocabulary entailment is more powerful than simple entailment, in the sense that a given set of assumptions entails more consequences. In general, as the reserved vocabulary is increased and extra semantic conditions imposed, the class of satisfying interpretations is restricted, and hence the corresponding notion of entailment increases in power. For example, if S simply entails E then it also rdf-entails E, since every rdf-interpretation is also a simple interpretation; but S may rdf-entail E even though it does not simply entail it. Intuitively, a conclusion may follow from some of the extra assumptions incorporated in the semantic conditions imposed on the reserved vocabulary. (Another way of expressing this is that any restriction on interpretations decreases the number of possible ways that an interpretation might be a counterexample to E's following from S.) Simple entailment is therefore the weakest form of RDF entailment, which holds for any reserved vocabulary; it could be characterized as entailment which depends only on the basic triples syntax of RDF graphs, without making any further assumptions about the meaning of any urirefs. Simple entailment is the vocabulary entailment of the empty namespace.

It is easy to see that the lemmas in section 2 do not hold for rdf-entailment. For example, the triple

rdf:type rdf:type rdf:Property

is true in every rdf-interpretation, and hence rdf-entailed by the empty set, which immediately contradicts the interpolation lemma for rdf-entailment. Rather than develop a separate theory of the syntactic conditions for recognising entailment for each reserved vocabulary, however, we will give a general technique for reducing these broader notions of entailment to simple entailment, by defining the closure of an RDF graph relative to a set of semantic conditions. The basic idea is to rewrite the semantic conditions as a set of syntactic inference rules, and define the closure to be the result of applying those rules to exhaustion. The resulting graphs will contain RDF triples which explicitly state all the special meanings embodied in the extra semantic conditions, in effect axiomatizing them in RDF itself.

Therdf-closure of an RDF graph E is the graph gotten by adding triples to E according to the following (very simple) rules:

1. Add the following triple (which is true in any rdf-interpretation):

rdf:type rdf:type rdf:Property

2. Apply the following rule recursively to generate all legal RDF triples (i.e. until none of the rules apply or the graph is unchanged.) Here xxx and yyy stand for any uriref, bNode or literal expression, aaa for any uriref.

if E contains then add
rdf1 xxx aaa yyy . aaa rdf:type rdf:Property .

(This rule will generate the triple mentioned in 1 in two steps from any RDF triple; nevertheless, we mention the triple explicitly since it is required to be in the closure of even an empty set of graphs.)

The following lemma is the basic result on rdf-entailment, and illustrates a general pattern of how to characterize vocabulary entailment syntactically. It is analogous to the completeness theorem in classical logic (though much simpler to prove).

RDF entailment lemma. S rdf-entails E iff the rdf-closure of the merge of S simply entails E.

(Note, we do not mean to suggest that actually generating the full closure would be the best process to use in order to determine namespace-entailment. In more complex cases it may be more efficient to use a process of backchaining on the closure rules, for example.)

5. RDFS interpretations

RDF Schema [RDFSchema] extends RDF to include a larger reserved vocabulary rdfsV with more complex semantic constraints:

RDFS reserved vocabulary
rdf:type rdf:Property rdfs:domain rdfs:range rdfs:Resource rdfs:Literal rdfs:Class rdfs:subClassOf rdfs:subPropertyOf

(rdfs:seeAlso, rdfs:isDefinedBy, rdfs:comment and rdfs:label are omitted, as the model theory places no constraints on their meanings.)

Although not strictly necessary, it is convenient to state the RDFS semantics in terms of a new semantic construct, a 'class', i.e. a resource which represents a set of things in the universe which have the same value of the rdf:type property. We will define a mapping ICEXT (for the Class Extension in I) from classes to their extensions, in terms of the relational extension of rdf:type, as follows:

ICEXT(x) = {y | <y,x> is in IEXT(I(rdf:type)) }

An rdfs-interpretation of V is a simple interpretation of (V union rdfsV) which satisfies the following semantic conditions. The first two of these are simply definitions of IC and ICEXT, which could be used to eliminate these concepts from the rest of the conditions; as noted earlier, the conditions on IR and IP could also be regarded as definitions. Literals are treated differently since the set of all literal values is defined globally; for comparison, note that IR is required to be only a subset of the set of all resources.

Note, these conditions for rdfs:domain and rdfs:range reflect our current understanding of multiple domain and range restrictions rather than the wording in [RDFSchema] . They correspond more closely to the meanings assumed by DAML+OIL[DAML] under which multiple domain and range assertions are understood conjunctively.

x is in ICEXT(y) iff <x,y> is in IEXT(I(rdf:type))

IC = ICEXT(I(rdfs:Class))

ICEXT(I(rdfs:Resource)) = IR

ICEXT(I(rdf:Property)) = IP

ICEXT(I(rdfs:Literal)) is a subset of LV

IC contains:

I(rdfs:Resource), I(rdf:Property), I(rdfs:Class), I(rdfs:Literal)

IP contains:

I(rdf:type), I(rdfs:domain), I(rdfs:range), I(rdfs:subPropertyOf), I(rdfs:subClassOf)

IEXT(I(rdfs:domain)) contains

<I(rdfs:domain), I(rdf:Property)>, <I(rdfs:range), I(rdf:Property)>, <I(rdf:type), I(rdfs:Resource)>

IEXT(I(rdfs:range)) contains

<I(rdfs:domain), I(rdfs:Class)>, <I(rdfs:range), I(rdfs:Class)>,

<I(rdf:type), I(rdfs:Class)>

If <x,y> is in IEXT(I(rdfs:range)) and <u,v> is in IEXT(x) then v is in ICEXT(y)

If <x,y> is in IEXT(I(rdfs:domain)) and <u,v> is in IEXT(x) then u is in ICEXT(y)

If <x,y> is in IEXT(I(rdfs:subClassOf)) then ICEXT(x) is a subset of ICEXT(y)

If <x,y> is in IEXT(I(rdfs:subPropertyOf)) then IEXT(x) is a subset of IEXT(y)

 

Since these clearly include the conditions on an rdf-interpretation, any rdfs-interpretation is also an rdf-interpretation of the same vocabulary.

(We will not attempt to give a pictorial diagram of an rdfs-interpretation.)

6. RDFS-entailment and RDFS closures

As before, we say that S rdfs-entails E when every rdfs-interpretation which satisfies all of S also satisfies E. We will approach rdfs-entailment similarly to rdf-entailment, by defining rdfs-closures. These require more complex rules to reflect the consequences of the semantic constraints on the rdfs reserved vocabulary. For example, the fact that the subset relationship is transitive means that two subClassOf assertions may entail a third, different, triple.

The rdfs-closure of an RDF graph E is the graph gotten by adding triples to E according to the following rules:

1. Add the following triples, which are true in any rdfs-interpretation. These assign classes, domains and ranges to the properties in the rdfs vocabulary. (There are several other triples which are true in every rdfs-interpretation, but they will be generated from these by other rules.)

rdfs:Resource rdf:type rdfs:Class
rdfs:Literal rdf:type rdfs:Class
rdfs:Class rdf:type rdfs:Class
rdf:Property rdf:type rdfs:Class


rdf:type rdf:type rdf:Property
rdf:type rdfs:domain rdfs:Resource
rdf:type rdfs:range rdfs:Class


rdfs:domain rdf:type rdf:Property
rdfs:domain rdfs:domain rdf:Property
rdfs:domain rdfs:range rdfs:Class


rdfs:range rdf:type rdf:Property
rdfs:range rdfs:domain rdf:Property
rdfs:range rdfs:range rdfs:Class


rdfs:subPropertyOf rdf:type rdf:Property
rdfs:subPropertyOf rdfs:domain rdf:Property
rdfs:subPropertyOf rdfs:range rdf:Property


rdfs:subClassOf rdf:type rdf:Property
rdfs:subClassOf rdfs:domain rdfs:Class
rdfs:subClassOf rdfs:range rdfs:Class

2. Apply the following rules recursively to generate all legal RDF triples (i.e. until none of the rules apply or the graph is unchanged.) Here, xxx, yyy and zzz stand for any uriref, bNode or literal, aaa for any uriref, and uuu for any uriref or bNode (but not a literal).

  If E contains: then add:
rdf1

xxx aaa yyy

aaa rdf:type rdf:Property
rdfs2

xxx aaa yyy

aaa rdfs:domain zzz

xxx rdf:type zzz
rdfs3

xxx aaa uuu

aaa rdfs:range zzz

uuu rdf:type zzz
rdfs4a xxx aaa yyy xxx rdf:type rdfs:Resource
rdfs4b xxx aaa uuu uuu rdf:type rdfs:Resource
rdfs5

aaa rdfs:subPropertyOf bbb

bbb rdfs:subPropertyOf ccc

aaa rdfs:subPropertyOf ccc
rdfs6

xxx aaa yyy

aaa rdfs:subPropertyOf bbb

xxx bbb yyy
rdfs7

xxx rdf:type rdfs:Class

xxx rdfs:subClassOf rdfs:Resource
rdfs8

xxx rdfs:subClassOf yyy

yyy rdfs:subClassOf zzz

xxx rdfs:subClassOf zzz
rdfs9

xxx rdfs:subClassOf yyy

aaa rdf:type xxx

aaa rdf:type yyy

Unlike the simpler rdf closure rules, the outputs of some of these rules may trigger others. For example, these rules will generate the complete transitive closures of all subclass and subproperty heirarchies, together with all of the resulting type information about everything which can be inferred to be a member of any of the classes, and will propagate all assertions in the graph up the subproperty heirarchy, re-asserting them for all super-properties.They will generate all assertions of the form

xxx rdf:type rdfs:Resource

for every xxx in V, and of the form

xxx rdfs:subClassOf rdfs:Resource

for every class name; and several more 'universal' facts, such as

rdf:Property rdf:type rdfs:Class

rdf:Property rdfs:subClassOf rdfs:Resource

However, it is easy to see that the rules will indeed terminate on any finite RDF graph, since there are only finitely many triples that can be formed from a given finite vocabulary.

A similar result applies here as in the case of rdf-entailment, though it takes longer to prove:

RDFS Entailment Lemma. S rdfs-entails E iff the rdfs-closure of the merge of S simply entails E.

6.1 A note on rdfs:Literal

While rdfs:Literal is required to be a subset of literals, in conformance with [RDFMS], there are severe restrictions on what can be said about literals in RDF, since RDF does not allow properties to be asserted of literals.The closure rules for rdfs-entailment have explicit exceptions which reflect this syntactic restriction. Similarly, while properties may be asserted of the the class rdfs:Literal, none of these can be validly transferred to literals themselves.

The reader should be careful not to confuse literals with literal values. For example, a triple of the form

xxx rdf:type rdfs:Literal

is legal if xxx is a uriref (or bNode), but should not be interpreted as asserting that xxx is a literal (which is why there are no closure rules to generate such triples.) What it says is that the uriref xxx denotes a literal value, ie has the same meaning as some literal in any interpretation. There is however no way in current RDF to specify exactly what literal such a uriref might be equal to.

7. RDF containers

Basic properties of RDF containers can be stated in terms of the semantics already described. An RDF container is a resource which has an rdf:type value rdf:Bag, rdf:Seq or rdf:Alt, and to which a countable set of membership properties rdf:_1, rdf:_2, etc.can be applied. In RDFS, these are all subclasses of the class rdfs:Container, which is the domain of every membership property, all of which in turn are members of the class rdfs:ContainerMembershipProperty.

As we have not restricted V to be finite, we can define a restricted vocabulary consisting of rdf:Bag, rdf:Seq or rdf:Alt, and all RDF container property names. However, there would be little point in doing so, since there are no semantic constraints on this vocabulary in RDF; the names may be used freely in any triple, and the resulting triples have no extra meaning in RDF over and above that assigned to them by the basic interpretation rules, so entailment with respect to this vocabulary would be identical to simple entailment. If the full container vocabulary were added to the reserved RDFS vocabulary, the only extra semantic constraints would be the three container subclass relationships, the domain restrictions of container membership properties to the container class, and the assignment of types to the container property names. However, this last constraint would mean that the corresponding notion of graph closure for this infinite vocabulary would yield infinite graphs, since every closure would be required to contain every triple of the form

rdf:_n rdf:type rdfs:ContainerMembershipProperty

It would probably therefore be more useful to define a notion of an (rather than the) RDF container vocabulary, which may contain some subset of container property names. The same semantic conditions would apply, but the relevant notions of entailment would only require triples using the relevant subset to be true, and the resulting notion of graph closure would yield a finite graph when given a finite graph as input.

As mentioned earlier, uses of containers in practice may well go beyond the rather basic meanings sanctioned by this model theory. For example, with this understanding of containers, a triple with a container as a subject does not entail any assertion with a member of the container as a subject, and the 'distributive' interpretation of rdf:Alt is not reflected in any entailment conditions.

Appendix A: Technical summary

1. Precise definitions of graph terminology.

To avoid any possible misunderstanding, we give here a painfully detailed description of the sense of RDF graph shown in the figures of [RDFMS] . (This notion of 'graph' is what is often called a multigraph; it differs slightly from the usual mathematical definition of graph, which does not permit more than one arc between a pair of nodes.)

An RDF graph is a structure <N,E,V,L,s,o, label> where

N is a set, called nodes

E is a set, called edges

V is a set of urirefs

L is a set of literals

s, o are mappings from E into N

label is a partial mapping from (N union E) to (V union L)

which satisfies the following conditions:

1. The restriction of label to E is a total mapping into V (every edge is labeled with a uriref)

2. If x=s(y) for some y in E, then label(x) is not in L (literals do not occur in subject position)

3. For x, y in N, if label(x)=label(y) in V, then x=y (graph is tidy on uriref nodes)

4. if label(e1)=label(e2) and s(e1)=s(e2) and o(e1)=o(e2) then e1=e2 (triples are not duplicated).

(Note, these conditions could all be relaxed without requiring major changes to the model theory; they reflect decisions made about the design of the RDF language.)

This definition is related to the 'set of triples' definition used in the text as follows.

For any x in N, define item(x) to be label(x) if label(x) is defined, otherwise x. That is, item(x) is the label on the node if there is one, but applied to blank nodes it is the node itself.

The set of triples corresponding to an RDF graph is then the set {<item(s(E)), label(E), item(o(E))>}for all E in the graph.

To obtain an N-Triples document describing the graph, define NTitem(x) to be a a textual form as follows: if label(x) is a uriref, then NTitem(x) has the form <label(x)>; it it is a literal, then NTitem(x) has the form "label(x)"; and if it is a blank node, then NTitem(x) is a nodeID expression unique to that node, ie distinct from the NTitem of any other blank node in the graph.Then the Ntriples document is the result of concatenating the corresponding lines of text each of the form:

NTitem(s(E)) NTitem(E) NTitem(o(E)) .<line>

RDF graph corresponding to a set of triples can be defined, mathematically, by setting N to be the set of urirefs, blank nodes and occurrences of literals in the set of triples; E to be the set of triples; defining s and o in the obvious way: s(<S,P,O>)=S; o(<S,P,O>)=O; and defining label by label(x)=x on all urirefs, and label(x) to be the literal occurring at x when x is a literal occurrence. (To be extremely finicky, one could define a literal occurrence to be a pair consisting of the literal and the triple in which it occurs, and then define label(<l,t>) to be l. This rather delicate distinction between literals and occurrences of literals is needed to support some of the proposals currently under consideration for literal datatyping. We include it here as a proof of concept; however, the final version of the model theory may not need it, in which case the exposition will be somewhat simplified, and literals treated like urirefs in being given a single value in any interpretation. Readers should not, therefore, base any important decisions on this at present.)

A more constructive way to define the RDF graph corresponding to a set of triples is as follows, in terms of an operation of 'merging' two nodes of a graph. Consider each triple as an isolated graph with two nodes linked by one arc; form the disconnected graph made up of these isolated graphs; merge all nodes with the same uriref or with the same nodeID; then delete all nodeIDs. The resulting graph is an RDF graph in the sense of [RDFMS] .

2. Summary of model theory

RDF/RDFS model theory summary

0. Domains and mappings of interpretation I

vocab(I): set of urirefs ; LV: (global) set of literal values ; IR: set of resources (universe); IP: subset of IR (properties) ; IC: subset of IR (classes).

XL: literals -> LV

IS: vocab(I) -> IR

IEXT: IP -> subsets of (IR x (IR union LV))

ICEXT: IC -> subsets of IR

1. Basic equations

E is:

I(E) is:

a literal node

XL(E)

a (node labeled with a) uriref

IS(E)

an asserted triple <s p o>

true if <I(s), I(o)> is in IEXT(I(p)), otherwise false

any other triple

not defined

a ground RDF graph

false if I(E') =false for any asserted triple E' in E, otherwise true

an unlabeled node (blank node)

not defined ; but [I+A](E) = A(E)

an RDF graph

true if [I+A'](E) = true for some A': anon(E) -> IR, otherwise false.

2. Class extensions

E is:

I(E) is in IC; ICEXT(I(E)) is:

rdfs:Resource

IR     (The universe of the interpretation)

rdf:Property

IP     (Properties; subset of IR, domain of IEXT)

rdfs:Class

IC     (Classes; subset of IR, domain of ICEXT)

rdfs:Literal

a subset of LV    (Literal values)

3. Property extensions

E is:

I(E) is in IP; <x,y> is in IEXT(I(E)) iff:

rdf:type

x is in ICEXT(y)

E is:

I(E) is in IP; if <x,y> is in IEXT(I(E)) then:

rdfs:domain

if <u,v> is in IEXT(x) then u is in ICEXT(y)

rdfs:range

if <u,v> is in IEXT(x) then v is in ICEXT(y)

rdfs:subClassOf

ICEXT(x) is a subset of ICEXT(y)

rdfs:subPropertyOf

IEXT(x) is a subset of IEXT(y)

4. Domain and Range
IEXT(I(rdfs:domain)) contains:

<I(rdfs:domain), I(rdf:Property)>

<I(rdfs:range), I(rdf:Property)>

<I(rdf:type), I(rdfs:Resource)>

IEXT(I(rdfs:range)) contains:

<I(rdfs:domain), I(rdfs:Class)>

<I(rdfs:range), I(rdfs:Class)>

<I(rdf:type), I(rdfs:Class)>

 

Appendix B: Acknowledgments

This document has benefited from inputs from many members of the RDF Core Working Group. Particular thanks to Dan Connolly for clarifying the relationship between RDF and RDFS and suggesting the 'triples' definition of RDF graph, Ora Lassila for the idea of using graph syntax, Sergey Melnick for suggesting a translation into logic, and Jos deRoo and Graham Klyne for finding errors in earlier drafts.

The use of an explicit extension mapping to allow self-application without violating the axiom of foundation was suggested by Chris Menzel. Peter Patel-Schneider found several major errors in an earlier draft, and suggested several important technical improvements.

References

Normative

[RDF/XML]
Dave Beckett (editor), Refactoring RDF/XML Syntax, 6 September 2001 (W3C Working Draft).
[RDFMS]
Ora Lassila, Ralph R. Swick (editors), Resource Description Framework (RDF) Model and Syntax Specification, 22 February 1999.
[RDFSchema]
Dan Brickley, R.V. Guha (editors), Resource Description Framework (RDF) Schema Specification 1.0, 27 March 2000 (W3C Candidate Recommendation).
[RDFTestCases]
Art Barstow, Dave Beckett (editors), RDF Test Cases , 12 September 2001 (W3C Working Draft).
[RFC 2396]
T. Berners-Lee, Fielding and Masinter, RFC 2396 - Uniform Resource Identifiers (URI): Generic Syntax, August 1998.

Non-Normative

[KIF]
Michael R. Genesereth, Knowledge Interchange Format, 1998 (draft American National Standard).
[Hayes&Menzel]
Patrick Hayes, Christopher Menzel, A Semantics for the Knowledge Interchange Format , 6 August 2001 (Proceedings of 2001 Workshop on the IEEE Standard Upper Ontology)
[DAML]
Frank van Harmelen, Peter F. Patel-Schneider, Ian Horrocks (editors), Reference Description of the DAML+OIL (March 2001) ontology markup language
[Marchiori&Saarela]
Massimo Marchioi, Janne Saarela, Query + Metadata + Logic = Metalog, 1998
[Fikes&McGuinness]
Richard Fikes, Deborah L. McGuinness, An Axiomatic Semantics for RDF, RDF Schema, and DAML+OIL, KSL Technical Report KSL-01-01, 2001

Change Log from previous versions

Changes from Working Draft 23 September 2001

Overall presentation re-organized, and many small changes to wording, extra explanatory sentences added, typos and errors corrected. In particular, faulty definitions of RDF graph corrected, and missing domain/range information restored to rdfs closure rules.

'rdf entailment' changed to 'simple entailment'. Reserved vocabularies introduced, concept of vocabulary entailment introduced, rdf-entailment more clearly distinguished from rdfs-entailment.

Treatment of literals and rdfs:Literal changed slightly (XL applied to literal nodes rather than literals;RDF graph syntax not required to merge literal nodes; interpretations not required to contain all literal values.)

References to rdfs:ConstraintResource and rdfs:ConstraintProperty removed.(Subsequent to resolution of issue rdfms-constraint-properties-resource. )