[Contains comments by GK, highlighted as red italics thus; document text referenced by comments is also highlighted as red italics.]

RDF Model Theory

W3C Working Draft 14 December 2001

0. Introduction

0.1 Model-theoretic semantics

[...]

The chief utility of such a semantic theory is not to suggest any particular processing model, or to provide any deep analysis of the nature of the things being described by the language (in our case, the nature of resources), but rather to provide a technical tool to analyze the semantic properties of proposed operations on the language; in particular, to provide a way to determine when they are valid, ie they preserve meaning.

[The term "valid" here may be confusing, as it has specific meanings in logic and XML; maybe just say "to provide a way to determine when they preserve meaning"?]

[Later: I've just noticed your definition of "valid" applied to a process, in section 2, which is not one I've come across in my limited reading of logic textbooks.]

[...]

 

0.2 Graph syntax

Any semantic theory must be attached to a syntax. Of the several syntactic forms for RDF, we have chosen the RDF graph as described in [RDFMS] as the primary syntax, largely for its simplicity.

[I think that to say "as described in [RDFMS]" may be confusing, since the definition there has been shown to be inadequate. Maybe: "as introduced in [RDFMS], and clarified below".]

... We understand linear RDF notations such as N-Triples and rdf/xml [RDF/XML] as lexical notations for specifying RDF graphs. (There are graphs that cannot be described by these notations, however.)

[Suggest: "There are well-formed graphs ..."]

... Two RDF documents, in whatever lexical form, are syntactically equivalent if and only if they map to the same RDF graph. The model theory assigns interpretations directly to the graph; we will refer to this as the 'graph syntax' to avoid ambiguity, since the bare term 'syntax' is often assumed to refer to a lexicalization.

An RDF graph can be defined in terms of labeled nodes and arcs (see Appendix A), but we will use an equivalent but more convenient definition, in which a graph is defined to be a set of triples of the form <S, P, O>, where P is a URI reference (in the sense of [RFC 2396]), which we will call auriref, S is either a uriref or a blank node, and O is either a uriref, a blank node, or a literal. Blank nodes are considered to be drawn from some set of 'anonymous' entities which have no 'label' and are unique to the graph. The nodes of the graph corresponding to a set of triples can then be defined as the urirefs and blank nodes in the S and O positions of the triples of the set, together with the set of all distinct occurrences of literals in the set. (Exact descriptions of the correspondences between this conception of an RDF graph and others are given in Appendix A.)

[1 - missing space in "a uriref"]

[2 - "S is either a uriref or a blank node", etc.,don't seem to clearly distinguish between a node and its label. I think this may be a point of confusion for people not used to formal language theory, and it's probably worth making the distinction as clear as possible.]

[3 - I think this paragraph effectively defines some important terms for the rest of the document: uriref, blank node, literals. I think it would greatly aid readabilitty of these definitions were called out more clearly, say in the form of a definition list. I found particularly whe reading the start of section 1.2 that I was wondering where I should look for the definitions.]

 

(This way of describing RDF graphs simplifies the exposition of the model theory in several ways, particularly by not requiring us to distinguish between graph nodes and their labels. It has the elegant consequence that the result of merging several graphs is simply the union of the set of triples comprising each of the graphs separately. Notice that disjoint graphs do not have any blank nodes in common, by definition, and that each separate occurrence of a literal is considered a separate node (in contrast to urirefs); we will therefore distinguish between literals and literal nodes.)

[On reading this, I thought "is this really true of blank nodes and literals?". I guess it may be, in a technical sense, but it seems confusing to me. I think that some more elementary words about the difference between a node and the text that is used to label it would be helpful; e.g. multiple occurences of the same uriref string are presumed to indicate a single node, but multiple literals indicate distinct nodes, and something similar for blank nodes ... No, that's not right; I think I almost understand this stuff but I still get confused thinking about it, which is why I think it needs to be explained carefully. Try again:

A graph consists of nodes and directed arcs:

A tidy graph is one in which no more than one node is labelled with the same URI.

Now, I think what you say about merging graphs isn't strictly true, unless we also assert that nodes in different graphs with the same uriref label are really the same node. The definition of a tidy graph doesn't do this, I think. It seems to me that the starting point of this approach is that nodes have their own identity irrespective of how they are labelled (this is what avoids the label-scope issues) (*); so something extra must be said about how the graph labelling is constrained for the purpose of defining semantics.

(*) You later say "the important syntactic property of RDF graphs is that each distinct item in an RDF graph is treated as a distinct referring entity...", which is what I mean here when I say that nodes have their own identity.

Now, I think one can proceed to define the equivalent, and more convenient, triple form...]

 

To indicate blank nodes in the triples of a graph we will use the nodeID convention used in the N-triples syntax described in [RDFTestCases]. (However, we will use letters or short letter sequences to stand in place of urirefs, in the interests of brevity.) Note that while these node identifiers (formerly called bNodes) identify blank nodes in the surface syntax, these expressions are not considered to be the label of the graph node they identify. In particular, two N-triples documents which differ only by re-namingtheir node identifiers will be understood to describe the same RDF graph. This means that one may not, in general, obtain the correct Ntriples description of a merged graph simply by forming the merge of the corresponding Ntriples documents which describe the original graphs, since the same nodeID may have been used in more than one of the documents. To merge Ntriples documents it is necessary to check if the same nodeID is used in two documents, and to replace it with a distinct nodeID in one of them, before merging the documents. Node identifiers of blank nodes play a role in an ntripleDoc analogous to that played by bound variables in a logical expression. They are local to the document and serve only to indicate a 'connection' between other expressions.

[Missing space in "renaming their"]

[I'm troubled by this description, though I think I understand the intent. The term "same RDF graph" troubles me, because I think the same graph must consist of the same nodes. Consider

s p "string" .

and

s p "string" .

Are they the same? The definitions given above suggest that the graph:

s p "string" .

s p "string" .

has two literal nodes and two property arcs (each instance of a literal indicating a distinct node). To me, this graph would naturally be formed by merging two smaller graphs noted above.

Why is it important whether or not two graphs are the same? I think what matters for our purposes is whether they are semantically equivalent; i.e. incur mutual entailment, and that it doesn't matter whether the nodes are same or different. I think the design goal of the graph syntax here is to keep the overall development as simple as possible - I don't see the "same graph" idea helping in that respect.]

 

Other RDF lexicalizations may use other means of indicating the graph structure; for our purposes, the important syntactic property of RDF graphs is that each distinct item in an RDF graph is treated as a distinct referring entity in the graph syntax.

An RDF graph will be said to be ground if it has no unlabeled nodes. The vocabulary of a graph is the set of urirefs that it contains.

[Would it be more terminologically consistent to say "blank nodes" here?]

[Do literals not figure somehow in the "vocabulary" of a graph? Maybe not, since this appears to be a definition of "vocabulary", but it does rather beg the question of what role literals play. What would you call the set of all node and arc labels (urirefs and literal strings) in a graph?]

 

1. Interpretations

1.1 Technical Note

[Suggest: "Technical notes"]

[From a logician's point of view, I can appreciate that the first paragraph may be most important and most interesting, hence first in the section. But from a lesser mortal's PoV, I think the final paragraph may be more important for understanding what follows, and suggest it be moved to the start of this section.]

 

We assume that there is no restriction on the domains and ranges of properties; in particular, a property may be applied to itself. When classes are introduced in RDFS, we will assume that they can contain themselves. This might seem to violate one of the axioms of standard (Zermelo-Fraenkel) set theory, the axiom of foundation, which forbids infinitely descending chains of membership. However, the semantic model given here distinguishes properties and classes as objects from their extensions - the sets of object-value pairs which satisfy the property, or things that are 'in' the class - thereby allowing the extension of a property or class to contain the property or class itself without violating the axiom of foundation. In particular, this use of a class extension mapping allows classes to contain themselves. For example, it is quite OK for (the extension of) a 'universal' class to contain the class itself as a member, a convention that is often adopted at the top of a classification hierarchy. (If an extension contained itself then the axiom would be violated, but that case never arises.) The technique is described more fully in [Hayes&Menzel], which gives a semantics for an infinitary extension of full first-order logic.

Notice that the question of whether or not a class contains itself as a member is quite different from the question of whether or not it is a subclass of itself.

In what follows, the fact that two sets are given different names should not be taken to imply that they are disjoint; we will explicitly state any disjointness or containment conditions as they arise. In the same spirit, the fact that one set is stated to be a subset of another should not be interpreted as saying that these sets cannot be identical, unless this is stated explicitly.

 

1.2 Vocabularies (urirefs, resources and literals)

[The section name here seems to conflict with the earlier definition of 'vocabulary'.]

[...]

An interpretation assigns meanings to symbols in a particular vocabulary of urirefs.

[This may be true of the MT as developed here, but I wonder if it would be useful to acknowledge that URIs are intended to have global definition. Or maybe: "An interpretation assigns meanings to graphs that use a particular vocabulary of urirefs."

I think this point is stated more clearly in the introduction to section 1.3; maybe it's redundant here?]

[...]

 

We do not take any position here on the way that node labels may be composed from other expressions, e.g. from relative URIs or Qnames; the model theory simply assumes that such lexical issues have been resolved in some way that is globally coherent, so that a single uriref can be taken to have the same meaning wherever it occurs. Similarly, the model theory given here has no special provision for tracking temporal changes; it assumes, implicitly, that urirefs have the same meaning whenever they occur. (If one wishes to apply RDF in a context where the referents of urirefs (or names more generally) may change with time, then the current theory could be regarded as defining a 'snapshot' of the meaning of a changing representation.)

[I think this paragraph deals with two distinct ideas which would be usefully separated into separate paragraphs (suggest paragraph break before "Similarly ...").]

[I wonder if the comment about the document's lexical conventions for urirefs should appear sooner, either in section 0 or 1.1.]

[Concerning temporal changes, John Sowa suggests some ways of dealing with these in his Knowledge Representation book that I think could be applied to the RDF model theory as it stands. But I suppose it's appropriate to say that the MT doesn't attempt to treat the issue of itself.]

 

Notice that a vocabulary consists entirely of urirefs; literals are assigned values by some external mechanism.

[Maybe this point should be made earlier; e.g. end of section 0.2?]

[...]

 

1.3 Interpretations of ground graphs

All interpretations will be relative to a set of urirefs, called the vocabulary of the interpretation, so that one has to speak, strictly, of an interpretation of an RDF vocabulary, rather than of RDF itself. (For a lexicalized version of the language, we can think of the vocabulary of an interpretation, more traditionally, as being a subset of the URI-indicating expressions used by that lexicalization.)

["interpretation of" suggests that it's the vocabulary that is being interpreted; maybe "interpretation for"? Or, as you use below, "interpretation on"?]

 

A simple interpretation I on the vocabulary V is defined by:

1. A non-empty set IR of resources, called the domain or universe of I.

2. A mapping IEXT from IR into the powerset of IRx(IR union LV) (i.e. the set of sets of pairs <x,y> with x in IR and y in IR or LV)

3. A mapping IS from V into IR

IEXT(x) is a set of pairs, i.e. a binary relational extension, called the extension of x. This trick of distinguishing a relation as an object from its relational extension allows a property to occur in its own extension, as noted earlier. Note that no particular relationship is assumed between IR and LV.

[For less mathematical readers, it might help to point out that IEXT effectively defines the interpretation of properties; e.g. "IEXT(x) is a set of pairs corresponding to the arguments for which property x is true, ...".]

 

It is convenient to define IP to be the subset of IR with a nonempty extension; intuitively, IP is the set of properties.

The denotation of a ground RDF graph in I is then given recursively by the following rules, which extend the interpretation mapping I from labels to graphs. These rules (and extensions of them given later) work by defining the denotation of any piece of RDF syntax E in terms of the denotations of the immediate syntactic constitutents of E, hence allowing the denotation of any piece of RDF to be determined by a kind of syntactic recursion.

[I think it would help if this "syntactic recursion" could be related to a formal syntax for a graph, maybe included in section 0.2; e.g.

<graph> ::= SET OF ( <asserted triple> | <unasserted triple> )

<asserted triple> ::= <triple>

<unasserted triple> ::= <triple>

<triple> ::= <s> <p> <o>

<s> ::= <uriref>

<p> ::= <uriref>

<o> ::= <uriref> | <literal>

The interpretation below could then be defined for E being <literal>, <uriref>, <asserted triple> <unasserted triple> and <graph>. I think some statement about the interpretation of a non-asserted triple is required for unambiguity, given that the idea has been introduced; that interpretation would be simply 'true', which I think is pretty much equivalemt to what you say below: "... it would be necessary to restrict the definitions to the sets of asserted triples in the graphs"]

[...]

 

if E is a literal node then I(E) = XL(E)
if E is a uriref then I(E) = IS(E)
if E is an asserted triple <s, p, o> then I(E) = true if <I(s),I(o)> is in IEXT(I(p)), otherwise I(E)= false.
if E is a ground RDF graph then I(E) = false if I(E') = false for some asserted triple E' in E, otherwise I(E) =true.

 

The use of the phrase "asserted triple" is a deliberate weasel-worded artifact, to allow an RDF graph or document to contain triples which are being used for some non-assertional purpose. Strict conformity to the RDF 1.0 specification [RDFMS] assumes that all triples in a document are asserted triples, but making the distinction allows RDF parsers and inference engines to conform to the RDF syntax and to respect the RDF model theory without necessarily being fully committed to it. RDF as presently defined provides no syntactic means to distinguish asserted from nonasserted triples, however, so the distinction can be safely ignored in the remainder of the document, which assumes that all triples in a graph are asserted.(To apply the subsequent results to RDF containing unasserted triples, it would be necessary to restrict the definitions to the sets of asserted triples in the graphs.)

1.4 Example

[...]

1.5. Unlabeled nodes as existential assertions

We could treat unlabeled nodes exactly like urirefs, semantically speaking, by extending the IS mapping to include them as well as urirefs. That would amount to adopting the view that an unlabeled node is equivalent to a node with an unknown label. However, it seems to be simpler, and more in conformance with [RDFMS], to treat unlabeled nodes as simply indicating the existence of a thing without assuming that it has a fictitious name. (This decision can be defended on both philosophical and pragmatic grounds.See http://www.w3.org/2000/03/rdf-tracking/#rdfms-identity-anon-resources for a summary and pointers to the extended discussions.)

[As a commentary in work-in-progress the above is fine, but for final publication I'd suggest simplifying it to something like: "Unlabeled nodes are treated as simply indicating the existence of a thing without saying anything about how that thing is or might be named."]

 

This will require some definitions, as the theory so far provides no meaning for unlabeled nodes. Suppose I is an interpretation and A is a mapping from some set of unlabeled nodes to the universe IR of I, and define I+A to be an extended interpretation which is like I except that it uses A to give the interpretation of unlabeled nodes. Define anon(E) to be the set of unlabeled nodes in E. Then we can extend the above rules to include the two new cases that are introduced when unlabeled nodes occur in the graph:

If E is an unlabeled node then [I+A](E) = A(E)
If E is an RDF graph then I(E) = true if [I+A'](E) = true for some mapping A' from anon(E) to IR, otherwise I(E)= false.

 

Notice that we have not changed the definition of an interpretation. The same interpretation that provides a truth-value for ground graphs also assigns truth-values to graphs with unlabeled nodes, even though it provides no interpretation for the unlabeled nodes themselves. Notice also that the unlabeled nodes themselves are perfectly well-defined entities with a robust notion of identity; they differ from other nodes only in not being assigned a direct model-theoretic interpretation, which means that they have no 'global' meaning (i.e. outside the graph in which they occur).

[I was a bit confused by this paragraph. In some sense, the definition manifestly has changed to include the table above. On review, I note that this is the definition of a denotation under a given interpretation. My confusion is not helped by phrases like "the interpretation assigns a truth value...".

I suggest expanding the first sentence a little: "Notice that we have not changed the definition of an interpretation: it still consists of the same values IR, IEXT and IS. What has been added are rules for denotation under the interpretation, so that ...". I'd also suggest saying "it provides no denotation for the unlabeled nodes" rather than "it provides no interpretation for the unlabeled nodes".]

[...]

 

1.6 Comparison with formal logic

[This section is useful and interesting, but not part of the normative definition. I suggest that for final publication it be moved to an appendix. Also, I note the text here assumes the fixed interpretation of literals, and may need to be reviewed.]

With this semantics, it is simple to translate an RDF graph into a logical expression with essentially the same meaning, as several other authors have noted previously. [Marchiori&Saarela],[Fikes&McGuinness].

[...]

 

2. Simple entailment between RDF graphs.

[It took me a while to realize how central entailment is to delivering the value of formal semantics; I'm wondering if a couple of words might not help new readers on the way here; e.g.

"""

Entailment is a key idea that binds abstract formal semantics to real-world applications. If A entails B, the "meaning" of B is somehow contained in, or subsumed by, A. If A entails B and B entails A, then A and B both "mean" the same thing. Through the notions of satisfaction and entailment, formal semantics gives a rigorous definition to the notion of "meaning", and in particular a (sometimes) computable way to determine whether or not meaning is preserved by some transformation on a representation of knowledge.

"""]

 

Following conventional terminology, we say that I satisfies E if I(E)=true, and that a set S of expressions (simply) entails E if every interpretation which satisfies every member of S also satisfies E. If {E} entails E' we will say that E entails E'. (In later sections these notions will be adapted to classes of interpretations with particular reserved vocabularies, but throughout this section entailment should be interpreted as simple RDF entailment.)

[What does the distinction between {E} and E above actually indicate? I think the point would be easier to follow by saying: "If the singleton set {E} entails E' we will say that E entails E'."]

 

Conjunction Lemma.If E is ground, then I satisfies E iff it satisfies every triple in E.

I.e. a ground graph is equivalent in meaning to the logical conjunction of its component triples.

To give a syntactic characterization of entailment we will need to define some relationships between RDF graphs. If E is an RDF graph, say that E' is a subgraph of E when every node and arc in E' are also in E (with the same labels). This corresponds to taking a subset of the triples constituting the graph. Obviously any subgraph of a tidy graph is tidy.

[I'm thinking that care may be needed to ensure that the treatment of literals doesn't break this (e.g. per P/P++).]

[...]

 

The following is proven by a (simple version) of the technique used to prove Herbrand's theorem in first-order logic, hence the name:

Herbrand Lemma. Any RDF graph has a satisfying interpretation.

This means that there is no such thing as an inconsistency or a contradiction in RDF, which is not surprising since the language does not contain negation.

[[[Aside: If I understand correctly, Herbrand's theorem allows satisfaction of an expression to be evaluated (sometimes) in terms of constants that appear in it; i.e. if it is not possible to satisfy an expression under any interpretation, it can be falsified using an expression containing only constants that already appear in the expression (or, if none, some arbitrary constant). Am I on track here?]]]

[...]

 

2.1 Skolemization

[Should this be moved to an appendix on final publication? (Similar rationale as section 1.6)]

[...]

Skolemization Lemma. Suppose sk(E) is a skolemization of E with respect to V. Then sk(E) entails E; and if sk(E) entails F and vocab(F) is disjoint from V, then E entails F .

[Would it also help to point out that if RDF is used in a "non-assertional" mode, such as a query "does K entail Q?", then it can be seen from the Skolemization lemma that Skolemization is clearly not usable in such cases, as there is no licensed entailment of the form Sk(E)?]

2.2 Merging RDF graphs

Suppose S is a set of RDF graphs, then their merge is the union of the sets of triples in all the graphs in S. Notice that one does not, in general, obtain the merge of a set of graphs by concatenating their corresponding N-triples documents and constructing the graph described by the merged document, since if some of the documents use the same node identifiers, the merged document will describe a graph in which some of the nodes have been 'accidentally' merged.

Merging lemma. The merge of a set S is entailed by S, and entails every member of S.

[Is this necessarily true under a non-fixed interpretation of literals?]

 

Notice that unlabeled nodes are not identified with other nodes in the merge, and indeed this reflects a basic principle of RDF graph inference: nodes with the same uriref must be identified, but unlabeled nodes must not be identified with other nodes or re-labeled with urirefs, in order to ensure that the resulting graph is entailed by what one starts with. This is made precise in the following two lemmas (which follow directly from the strong Herbrand lemma) :

Anonymity lemma 1. Suppose E' is like E except that at least one unlabeled node in E is labeled with a uriref in E'. Then E does not entail E'.

Anonymity lemma 2. Suppose that E' is like E except that two distinct unlabeled nodes in E have been identified in E'. Then E does not entail E'.

[I'm uneasy that you say these follow from the strong Herbrand Lemma. It seems to me they would still (generally) hold in situations where the Strong Herbrand lemma does not apply.

In fact, I'm now not sure I believe the Strong Herbrand lemma: consider the graph:

_:x prop obj .

any satisfying interpretation must have IEXT(prop) containing <s,obj>, for some s. Then:

s prop obj .

is a satisfied ground triple not in the original graph.]

[...]

The main result for simple RDF inference is:

Interpolation Lemma. S entails E iff a subgraph of the merge of S is an instance of E.

[The graph-relationship concept "is an instance of" has not previously been defined, I think. The meaning is clear enough from what follows, but I think it would be good to explicitly call out this idea somewhere.]

[Since this is a "main result", would a simple proof be in order?; e.g.

S entails merge(S), by merging lemma

Let MS' be a subgraph of merge(S) that is an instance of E, then

merge(S) entails SM', by subgraph lemma

SM' entails E, by ??existence lemma??

Hmmm... Shouldn't there be another lemma in section 2 to capture the inference from (foo baz) to (exists (?x) (foo ?x))? Along the lines of:

Existence lemma. If E' is the same as E, except that zero, one or more node labels in E are omitted in E', then E entails E'. The proof would follow pretty directly by construction of a suitable [I+A], and definitions in section 1.5.

I note that this also covers self-entailment.]

 

It might be thought that the operation of changing a bound variable would be an example of an inference which was valid but not covered by the interpolation lemma, e.g. the inference of

_:x foo baz

from

_:y foo baz

Recall however that by our conventions, these two expressions describe the same RDF graph, and any graph is both a subgraph and an instance of itself.

[re. earlier comments in section 0.2: I now see why the concept of "same graph" is important, but I still think more care is needed with the definition. Can the idea be relaxed to some idea of similar graphs, with a lemma that similar graphs entail each other?]

 

3. RDF Interpretations

[This section looks OK to me]

[...]

 

4. Rdf-entailment and rdf closures

Following the definitions in section 2, we will say that S rdf-entails E when every rdf interpretation which satisfies every member of S also satisfies E. This is an example of vocabulary entailment , i.e. entailment relative to a set of interpretations which satisfy extra semantic conditions on a reserved vocabulary.

[I struggled to understand the point of this, until I noticed the key difference from (simple) entailment was that it talks about "rdf interpretation"s. For simple consistency, I suggest it be hyphenated ("rdf-interpretation") as in the preceding section. I'd also suggest highlighting the difference, as in:

"""

We will say that S rdf-entails E when every rdf-interpretation that satisfies every member of S also satisfies E. This follows the definition of simple entailment in section 2, but is in terms of rdf-interpretations rather than simple interpretations. This is an example of vocabulary entailment , i.e. entailment relative to a set of interpretations which satisfy extra semantic conditions on a reserved vocabulary.

"""]

 

vocabulary entailment is more powerful than simple entailment, in the sense that a given set of assumptions entails more consequences.

[Capitalize "vocabulary".]

[Would "premises" be a more appropriate word than "assumptions" here?]

... In general, as the reserved vocabulary is increased and extra semantic conditions imposed, the class of satisfying interpretations is restricted, and hence the corresponding notion of entailment increases in power.

[Er... "increases in deductive power", maybe?]

... For example, if S simply entails E then it also rdf-entails E, since every rdf-interpretation is also a simple interpretation; but S may rdf-entail E even though it does not simply entail it. Intuitively, a conclusion may follow from some of the extra assumptions incorporated in the semantic conditions imposed on the reserved vocabulary. (Another way of expressing this is that any restriction on interpretations decreases the number of possible ways that an interpretation might be a counterexample to E's following from S.) Simple entailment is therefore the weakest form of RDF entailment, which holds for any reserved vocabulary; it could be characterized as entailment which depends only on the basic triples syntax of RDF graphs, without making any further assumptions about the meaning of any urirefs. Simple entailment is the vocabulary entailment of the empty namespace.

It is easy to see that the lemmas in section 2 do not hold for rdf-entailment. For example, the triple

rdf:type rdf:type rdf:Property

[Hmm... I note that rdf-interpretation is defined on vocabulary (V union rdfV), which sort-of suggests that rdfV is not covered by a simple interpretation. Rather than define the reserved vocabulary as something that is added to the "underlying" vocabulary, maybe one could say that, say, an rdf-interpretation is defined over some vocabulary V, which contains the reserved vocabulary, and satisfies the conditions noted.]

... is true in every rdf-interpretation, and hence rdf-entailed by the empty set, which immediately contradicts the interpolation lemma for rdf-entailment. Rather than develop a separate theory of the syntactic conditions for recognising entailment for each reserved vocabulary, however, we will give a general technique for reducing these broader notions of entailment to simple entailment, by defining the closure of an RDF graph relative to a set of semantic conditions. The basic idea is to rewrite the semantic conditions as a set of syntactic inference rules, and define the closure to be the result of applying those rules to exhaustion. The resulting graphs will contain RDF triples which explicitly state all the special meanings embodied in the extra semantic conditions, in effect axiomatizing them in RDF itself.

[I think I have seen the phrase "deductive closure" to mean what you here call "closure". I find "deductive closure" to be more descriptive (if correct ;-)]

[...]

 

5. RDFS interpretations

[...]

Note, these conditions for rdfs:domain and rdfs:range reflect our current understanding of multiple domain and range restrictions rather than the wording in [RDFSchema] . They correspond more closely to the meanings assumed by DAML+OIL[DAML] under which multiple domain and range assertions are understood conjunctively.

[I think this paragraph should be removed for final publication]

[...]

[I've not closely reviewed the table of interpretation conditions on this reading]

[...]

 

6. RDFS-entailment and RDFS closures

[...]

2. Apply the following rules recursively to generate all legal RDF triples (i.e. until none of the rules apply or the graph is unchanged.) Here, xxx, yyy and zzz stand for any uriref, bNode or literal, aaa for any uriref, and uuu for any uriref or bNode (but not a literal).

[Are you intending to anticipate the possibility of literals-as-subjects?: xxx appears in many subject positions in the table. Also, condition rdfs4a should probably use 'uuu' in place of 'xxx', as LV is not a subset of IR. Also for rdfs7, rdfs8?]

  If E contains: then add:
rdf1

xxx aaa yyy

aaa rdf:type rdf:Property
rdfs2

xxx aaa yyy

aaa rdfs:domain zzz

xxx rdf:type zzz
rdfs3

xxx aaa uuu

aaa rdfs:range zzz

uuu rdf:type zzz
rdfs4a xxx aaa yyy xxx rdf:type rdfs:Resource
rdfs4b xxx aaa uuu uuu rdf:type rdfs:Resource
rdfs5

aaa rdfs:subPropertyOf bbb

bbb rdfs:subPropertyOf ccc

aaa rdfs:subPropertyOf ccc
rdfs6

xxx aaa yyy

aaa rdfs:subPropertyOf bbb

xxx bbb yyy
rdfs7

xxx rdf:type rdfs:Class

xxx rdfs:subClassOf rdfs:Resource
rdfs8

xxx rdfs:subClassOf yyy

yyy rdfs:subClassOf zzz

xxx rdfs:subClassOf zzz
rdfs9

xxx rdfs:subClassOf yyy

aaa rdf:type xxx

aaa rdf:type yyy

Unlike the simpler rdf closure rules, the outputs of some of these rules may trigger others. For example, these rules will generate the complete transitive closures of all subclass and subproperty heirarchies,

[Typo in "hierarchies"]

[...]

 

6.1 A note on rdfs:Literal

[...]

 

7. RDF containers

[...]

As mentioned earlier, uses of containers in practice may well go beyond the rather basic meanings sanctioned by this model theory. For example, with this understanding of containers, a triple with a container as a subject does not entail any assertion with a member of the container as a subject, and the 'distributive' interpretation of rdf:Alt is not reflected in any entailment conditions.

[Er, where earlier. Did I miss something?]

 

Appendix A: Technical summary

1. Precise definitions of graph terminology.

[Is this intended to be normative? If so, maybe it should be in the main document body, or a big, bold pointer placed in the corresponding body text to this definition.]

[...]

This definition is related to the 'set of triples' definition used in the text as follows.

For any x in N, define item(x) to be label(x) if label(x) (x) is defined, otherwise x. That is, item(x) is the label on the node if there is one, but applied to blank nodes it is the node itself.

The set of triples corresponding to an RDF graph is then the set {<item(s(E)), label(E), item(o(E))>}for all E in the graph.

[This seems rather messy to me, and seems to reprise uneasiness I felt about some aspects of section 0.2. If a blank node can appear in a triple, then why not let *any* node appear in a triple? I don't see the value of creating this "semi-lexicalized" form of triple members used to define a graph, and I think it potentially confuses the more important matters.]

[By contrast, the full-lexicalization described below seems fine to me.]

To obtain an N-Triples document describing the graph, define NTitem(x) to be a a textual form as follows: if label(x) is a uriref, then NTitem(x) has the form <label(x)>; it it is a literal, then NTitem(x) has the form "label(x)"; and if it is a blank node, then NTitem(x) is a nodeID expression unique to that node, ie distinct from the NTitem of any other blank node in the graph.Then the Ntriples document is the result of concatenating the corresponding lines of text each of the form:

NTitem(s(E)) NTitem(E) NTitem(o(E)) .<line>

[You might want to say what <line> is here; I assume you mean a line-seperator sequence such as CR, LF.]

[Insert "The" (below) ...?]

RDF graph corresponding to a set of triples can be defined, mathematically, by setting N to be the set of urirefs, blank nodes and occurrences of literals in the set of triples; E to be the set of triples; defining s and o in the obvious way: s(<S,P,O>)=S; o(<S,P,O>)=O; and defining label by label(x)=x on all urirefs, and label(x) to be the literal occurring at x when x is a literal occurrence. (To be extremely finicky, one could define a literal occurrence to be a pair consisting of the literal and the triple in which it occurs, and then define label(<l,t>) to be l. This rather delicate distinction between literals and occurrences of literals is needed to support some of the proposals currently under consideration for literal datatyping. We include it here as a proof of concept; however, the final version of the model theory may not need it, in which case the exposition will be somewhat simplified, and literals treated like urirefs in being given a single value in any interpretation. Readers should not, therefore, base any important decisions on this at present.)

[Again, this seems rather messy - I think the "delicate distinction" doesn't arise if triple-members are nodes, not some lexical value used to label them. Hence the following seems much easier...]

A more constructive way to define the RDF graph corresponding to a set of triples is as follows, in terms of an operation of 'merging' two nodes of a graph. Consider each triple as an isolated graph with two nodes linked by one arc; form the disconnected graph made up of these isolated graphs; merge all nodes with the same uriref or with the same nodeID; then delete all nodeIDs. The resulting graph is an RDF graph in the sense of [RDFMS] .

2. Summary of model theory

RDF/RDFS model theory summary

0. Domains and mappings of interpretation I

vocab(I): set of urirefs ; LV: (global) set of literal values ; IR: set of resources (universe); IP: subset of IR (properties) ; IC: subset of IR (classes).

XL: literals -> LV

IS: vocab(I) -> IR

IEXT: IP -> subsets of (IR x (IR union LV))

ICEXT: IC -> subsets of IR

1. Basic equations

E is:

I(E) is:

a literal node

XL(E)

a (node labeled with a) uriref

IS(E)

an asserted triple <s p o>

true if <I(s), I(o)> is in IEXT(I(p)), otherwise false

any other triple

not defined

a ground RDF graph

false if I(E') =false for any asserted triple E' in E, otherwise true

an unlabeled node (blank node)

not defined ; but [I+A](E) = A(E)

an RDF graph

true if [I+A'](E) = true for some A': anon(E) -> IR, otherwise false.

2. Class extensions

E is:

I(E) is in IC; ICEXT(I(E)) is:

rdfs:Resource

IR     (The universe of the interpretation)

rdf:Property

IP     (Properties; subset of IR, domain of IEXT)

rdfs:Class

IC     (Classes; subset of IR, domain of ICEXT)

rdfs:Literal

a subset of LV    (Literal values)

3. Property extensions

E is:

I(E) is in IP; <x,y> is in IEXT(I(E)) iff:

rdf:type

x is in ICEXT(y)

E is:

I(E) is in IP; if <x,y> is in IEXT(I(E)) then:

rdfs:domain

if <u,v> is in IEXT(x) then u is in ICEXT(y)

rdfs:range

if <u,v> is in IEXT(x) then v is in ICEXT(y)

rdfs:subClassOf

ICEXT(x) is a subset of ICEXT(y)

rdfs:subPropertyOf

IEXT(x) is a subset of IEXT(y)

4. Domain and Range
IEXT(I(rdfs:domain)) contains:

<I(rdfs:domain), I(rdf:Property)>

<I(rdfs:range), I(rdf:Property)>

<I(rdf:type), I(rdfs:Resource)>

IEXT(I(rdfs:range)) contains:

<I(rdfs:domain), I(rdfs:Class)>

<I(rdfs:range), I(rdfs:Class)>

<I(rdf:type), I(rdfs:Class)>

 

[Having 'not defined'  for the denotation of a non-asserted triple seems rather tentative. The denotation of a graph comes out the same if you just say 'true'. This could cover things like the "syntactic triples" of DAML. I think anything more subtle than that would, in any case, require some revision or enhancement to the basic model theory, so why try and anticipate?]

[End of comments]