updated draft of model theory

RDF Model Theory StrawDog proposal (draft 7/27/01)(typos fixed in 
sections 1 thru 5, sections 6,7,8 added.)

Pat Hayes, IHMC

In what follows, 'N-triples' refers to Dave Beckett's version 5, 
relevant parts of which are:
ntripleDoc  ::=line*
line        ::=ws* (comment | blankline | triple) eoln
triple      ::=subject ws+ predicate ws+ object ws* '.' ws*
subject     ::=uriref | anonNode
predicate   ::=uriref
object      ::=uriref | anonNode | qLiteral

1. Introduction

The basic idea is to re-state the M&S notions of what a set of RDF 
triples means in mathematical terms, making as few extra or 
gratuitous assumptions as possible. By keeping things as general as 
possible, any restrictions that someone might want to impose can be 
captured by further restrictions on the model theory.

We assume that:
a. there is no restriction on what a resource can be
b. there is no restriction on the domains and ranges of properties; 
in particular, a property may be applied to itself.
c. all nodes in an RDF graph play fundamentally the same role in the 
language no matter how they are labelled (for comments on anonymous 
nodes, see later.)


2. Interpretations

First, we assume a global set  LV of literal values and a fixed 
mapping XL from the set of qLiterals to LV. All interpretations will 
be required to conform to XL on qLiteral expressions.
<comment> This leaves open the question of the exact nature of 
literals, their language-sensitivity and so on, while acknowledging 
their special status as expressions with a 'fixed' interpretation. 
</comment>

All interpretations will be relative to a set of RDF nodes, called 
the vocabulary of the interpretation, so that one has to speak, 
strictly, of an interpretation of an RDF 'nodespace' or of an RDF 
graph, rather than of RDF itself.

This is stated in terms of RDF nodes and graphs to follow the M&S. 
For N-triples, we can identify the vocabulary of I with a set of 
expressions that would be mapped to nodes by a parser, so it would be 
a subset of (uriref |anonNode)
<comment> Restricting an interpretation to a particular namespace is 
normal practice in model theory, since an interpretation mapping is 
expected to interpret only a given set of 'names'. It is possible to 
adopt a 'global' approach where any interpretation assigns a meaning 
to every possible expression, but this makes the subsequent 
development more awkward. For example, there are no 'new names' in a 
global model theory, since all names have already been given a fixed 
meaning by every interpretation. </comment>

An interpretation I (of vocab(I)) is defined by:
1. A nonempty set R of resources, called the domain or universe of I.
<comment> Whatever resources are, this a set of those. </comment>
2. A non-empty subset P of R corresponding to properties
<comment> Note that  P could be R itself.</comment>
3. A mapping IEXT: P -> Rx(R union LV)   (ie the set of pairs <x,y> 
with x in R and y in R or LV)
<comment> IEXT(x) is the extension of x. This allows a property to be 
an object and so occur in its own extension, a neat trick I learned 
from Chris Menzel. </comment>
4. A mapping IS:  vocab(I) -> R

The denotation of an RDF expression E in I is given by the following 
rules (note, other rules follow later):

>> if E is a <uriref>  or <anonNode> then I(E) = IS(E)
>> if E is a <qLiteral> then I(E) = LV(E)
>> if E is an asserted triple with the form  s p o
      then I(E) = true iff <I(s),I(o)> is in  IEXT(I(p)), otherwise I(E)= false.
>> if E is a set of triples then I(E) = false just in case I(E') = 
>>false for some asserted triple E' in E, otherwise I(E) = true.

(Note that if E contains no asserted triples then I(E)= true. This is 
the usual convention: an empty conjunction is considered to be true 
since it contains nothing that would cause it to be false. This makes 
empty assertions have vacuous content, as one would expect.)

Comments and blanklines have no semantics, of course.

The use of  the phrase "asserted triple" is a deliberate 
weasel-worded artifact, to allow an RDF graph/document to contain 
triples which are being used for some non-assertional purpose (such 
as expressing a query, implementing part of some expression of 
another language, or forming part of a pattern intended to be matched 
against some asserted RDF in another document, or just being 
breezy:-). We may wish to say that strict conformity to the RDF M&S 
assumes that all triples in a document are asserted triples, but 
making the distinction allows RDF parsers and engines such as CWM and 
Euler to conform to the RDF syntax and to respect the RDF model 
theory without necessarily being fully committed to it. Also, I 
expect that future extensions or versions of RDF will require such 
non-asserted triples, and this allows the model theory to be extended 
in those future directions.

(We will give examples of interpretations in an abbreviated format 
simply by listing R, IEXT and IS in the form R/IEXT/IS. P is the 
domain of IEXT. For example, for the vocabulary {a b c} the following 
is a small interpretation:
{1 2}/1->{<1,1>,<2,2>}/a->1,b->1,c->2
which makes the set
a b a
c a c
true, and all the triples
a c b
a b c
c c c
false)

3. Anonymous nodes as existential assertions

This notion of interpretation treats anonymous nodes exactly like 
URIs, semantically speaking. It amounts to adopting the view that an 
anonymous node is equivalent to a node with an unknown URI. However, 
we could adopt an extra clause which would treat an anonymous node 
like an existentially quantified variable. To do so requires that we 
decide on the 'scope' of the quantification. The easiest convention 
to state would be that the scope was the triple, but that seems 
unworkable since the same anonymous node may occur in several triples 
in a given document. To make the quantification have the scope of the 
entire document requires introducing the notion of 'document' into 
the syntax, however. I am not sure how to do this in the RDF graph 
model, but in N-triples I will take it that the appropriate syntax 
class is <ntripleDoc>.

This will require some definitions. If I is an interpretation and A 
is a mapping from some set of anonNodes to the domain of I, then 
define I[A] to be the interpretation which is like I except that 
I[A](x)=A(x)  wherever A is defined, ie I[A] uses the A-mapping to 
assign values wherever it can, and otherwise is identical to I. If E 
is an <ntripleDoc>, define anon(E) to be the set of anonymous nodes 
in E, and set(E) to be the set of asserted triples in E. Then we have 
the extra interpretation rule:

>> If E is an <ntripleDoc> then I(E) = true if I[A](set(E))=true for 
>>some A defined on anon(E), otherwise I(E)= false .

This effectively treats all anonymous nodes as existentially 
quantified at the boundary of the document containing them. This 
requires us to make a conceptual distinction between a document 
(containing anonNodes) and the set of triples in the document; the 
anonymous nodes are free in the set, but bound by an implicit 
existential quantifier in the document. (Notice that if the document 
E contains no anonNodes, then I(E)=I(set(E)) ) The distinction 
becomes important when we combine documents, since the result of 
unifying two documents may not have the same meaning as combining the 
two sets of triples into a single document. (The difference will be 
significant only if the two documents share an anonymous node. If 
this is guaranteed to be impossible, then we do not need to make the 
distinction; still, however, it will not be true in general that any 
document is equal to the union of the documents formed by dividing it 
up into subsets.) We will use the neutral term 'RDF expression' to 
refer to a triple, a set of triples, or a document.

It is convenient to define the vocabulary of an RDF expression to be 
the set of 'names' in it which are 'free'. The vocabulary of a triple 
is the set of <uriref>s and <anonNode>s in the triple; the vocabulary 
of a set of triples is the union of the vocabularies of the triples 
in the set; and the vocabulary of a document E is the set of 
<urirefs> in the vocabulary of set(E). The anonNodes of a document 
are bound and are not part of the vocabulary. (The correspondence to 
the earlier usage of this word is that the vocabulary of an 
interpretation of an expression must contain the vocabulary of the 
expression in order for the interpretation to give a well-defined 
value to the expression.)

4. Terminology

We say that I satisfies E if I(E)=true, and that E entails E' iff 
every interpretation which satisfies E also satisfies E'.

More subtle relationships require some definitions. (These are 
standard mathematical definitions for arbitrary mappings, given here 
for completeness.) If M is a subset of the vocabulary of I, define 
I\M to be the restriction of I to M, ie I\M(x)=I(x) on M. We will 
also say that I is an extension of I\M. If I and I' have the same 
restriction to the intersection of their vocabularies, then say they 
are compatible, and define the union I+I' of I and I' to be the 
interpretation whose vocabulary is the union of the vocabularies of I 
and of I', and which agrees with them on their vocabularies, ie 
(I+I')(x)=I(x) or =I'(x) wherever the latter are defined. Notice that 
if vocab(I) contains no anonNodes, then I[A] = I+A. Obviously, I 
satisfies E iff I\vocabulary(E) satisfies E.

5. Skolemisation

If we replace an anonNode by a uriref, the result is not entailed by 
the original document. In fact, in general, if E' contains any 
urirefs not contained in E then E cannot entail E'.  For example, the 
document containing the single triple
<foo> <baz> _:xxx
obviously bears some relation to the triple
<foo> <baz> <bar>
but it does not entail it, since the interpretation
{1,2}/1->{<1,1>}/{foo->1,baz->1,bar->2}
satisfies the first document but not the second. (Notice this 
interpretation assigns a value to something - <bar> - outside the 
vocabulary of the first document, which is how the entailment fails.)
However, the second triple entails the first document, since any 
interpretation which satisfies <foo> <bar> <baz> can be used to make 
the first document true by restricting it to the smaller vocabulary 
and assigning the interpretation of <bar> to the anonNode to provide 
a suitable I[A] mapping. More significantly, any RDF expression which 
is entailed by the second triple and does not contain <bar> is also 
entailed by the first document.

To see why, suppose E'' is such an expression and I is an 
interpretation on {<foo>,<bar>} which satisfies the first document. 
Then there must be a mapping A from _:xxx to the universe of I such 
that I+A satisfies the first triple. Define I' on the second triple 
by I'(<bar>)=A(_:xxx), then I' satisfies the second triple, so it 
must satisfy E''; but E'' does not contain <bar>, so it must be 
satisfied by I'\{<foo>,<bar>}, which is I; so, I satisfies E''; but I 
was arbitrary, so, E'' is entailed by the first document. QED.

This is in fact a general result. We can characterise skolemisation 
as follows. Consider a document E and another sk(E) got from E by 
replacing all its anonNodes by urirefs drawn from a 'new' vocabulary 
V which does not intersect vocab(E). Then sk(E) entails E, and if E' 
is any expression with no vocabulary in V and sk(E) entails E', then 
E entails E'. (The proof follows exactly the special case given in 
the previous paragraph, but takes longer to state because it has to 
refer to anon(E).) Notice that since sk(E) contains no anonNodes, 
I(sk(E))=I(set(sk(E))) for any I, so there is no need to distinguish 
a skolemised document from its set of triples.
The key to proper skolemisation is the use of skolem names which are 
guaranteed to never occur in any other document, thereby ensuring 
that any other expression in any other document which is entailed by 
the skolem form is also entailed by the original existential form. 
(The reverse follows trivially.) This is the sense in which anonNodes 
used in asserted triples can be replaced by urirefs without any 
significant 'loss of content', if the document is considered to be an 
assertion.

6. Entailment

In this section we give a few basic results about entailment in RDF. 
(Note, these results do not take account of extra semantic 
restrictions imposed by rdfs, and will be reconsidered when those are 
introduced.) Since the language is so simple, entailment can be 
recognized by relatively simple syntactic comparisons. The two basic 
forms of valid proof step in RDF are, in logical terms, the inference 
of P from P and Q, and the inference of (exists (?x) (foo ?x)) from 
(foo baz).  Proofs are given in an appendix.

Lemma 1. (subset lemma) Suppose E and E' are sets of triples (not 
documents!). Then E entails E' iff E  contains E'.

The basic insight here is that the triples in a set (ie with no 
existential quantification) can be assigned interpretations 
essentially independently of one another simply by adjusting the 
extensions of the properties; there is no 'connection' between the 
truth of one triple and the truth of another. This in turn is a 
reflection of the fact that one cannot express the material 
conditional in RDF, because RDF has no disjunction.

To describe implication relationships between documents we need to be 
precise about instantiating 'variables'. If E is an expression and v 
is a (syntactic) mapping from a set of anonNodes to urirefs, then let 
v(E) be the expression obtained by replacing every anonNode x in E by 
v(x) when v(x) is defined. v(E) will be called an instance of E. An 
expression containing no anonNodes will be called a ground RDF 
expression.

Lemma 2. (instance lemma) Suppose E is a document and v(E) an 
instance of E. Then v(E) entails E .

Lemma 3 (RDF interpolation lemma) Suppose E and E' are documents. 
Then E entails E' iff there is a set F of triples such that set(E) 
contains F and F is an instance of set(E').

What this means, in effect, is that pure RDF proofs need only be one 
step 'deep'.

7. Publishing content: assertion versus query.

The model theory characterizes truth-preserving relations between 
expressions (documents) but it does not specify what exactly is being 
'said' when a document is published. <comment> This is what one would 
expect, since publications are tantamount to speech acts rather than 
expressions. The same expression can, notoriously, be used to do 
various different things: it can be asserted, questioned, doubted, 
assented to, etc. </comment>

The most obvious assumption is that to publish an RDF expression is 
to assert that it is true, thereby in effect offering a warranty for 
anyone else to draw valid conclusions from it. Let us call this a 
descriptive or asserting publication. It can be characterized as 
making a public claim about the appropriate uses of the expression; 
in effect, it says: this can be correctly used to make inferences 
from. However, the model theory would apply just as well to a 
different kind of publication, where the intended meaning is not that 
it is appropriate to draw conclusions from the expression, but that 
inference is intended to go in the other direction, so that the 
publication says, in effect: this should be used to make inferences 
to; or, in other words, can anyone prove this? If we represent E 
entails E' by writing (E |=> E'), then assertional publication of E 
might be represented as (E |=> ??) , putting E at the blunt end of 
the arrow, while the other case - which we can call a querying 
publication -  might be written as putting E at the sharp end of the 
entailment, (?? |=> E);  where in each case the world in general (or 
the community to which the publication is addressed) is being invited 
to fill in the ?? blanks.

This picture is almost certainly too simplistic as it stands, since 
one would presume that the intention of publishing a query is not 
merely to advertise a potential conclusion, but would include an 
implicit request to be told about any assumptions out there that 
would entail it. This could be handled by a general assumption about 
various kinds of publication acts and their associated protocols; but 
in any case, RDF has no way to make this distinction at present. I 
mention it only to try to clarify the distinctions that have arisen 
in discussions.

We could propose a related language to RDF which might be called 
Resource Querying Format, which is identical to RDF except that RQF 
expressions are understood to be queries rather than assertions. 
<joke> The syntax of RQF could be identical to RDF except that an RQF 
document must end with the string ",eh?". </joke>
The processing appropriate to RQF would be somewhat different from 
that for RDF. In particular, anonNodes in RQF documents would have to 
be treated as genuine variables which can be bound to values at run 
time. A unification process which binds an RQF variable to an RDF 
uriref or anonNode would be a central part of the machinery for 
linking RDF assertions to the RQF queries which they entail.

8. Shared meanings and relative entailment.

Some rdf and rdfs triples are considered to have a fixed meaning, and 
these meanings can be captured by requiring that all interpretations 
satisfy some additional equations or rules, which might include some 
constraints on the universe. For example, the RDF machinery for 
reification requires that the universe contain enough things to be 
the reifications of the reified expressions, and that the denotations 
of the urirefs rdf:Statement, rdf:subject, rdf:predicate and 
rdf:object are all given fixed values which produce a 'mirror' of the 
syntax in the interpretation.  These extra constraints wil be 
introduced as the various rdf and rdfs constructs are mentioned in 
later sections.

More generally, the successful operation of the web seems to require 
that some meanings are 'shared' between those who publish content on 
the web and those who read it. For example, the file-locating 
protocols that utilize URI syntax to enable web browsing software to 
locate web pages consitutute a commonly accepted core of 'shared 
meaning' which is assumed to be available to the entire community 
without needing to be stated explicitly in assertional form. Exactly 
what counts as 'shared meaning' seems to be a matter for further 
discussion, but we can represent the idea here in a simple way by 
characterizing such sharing as a set of interpretations (for part of 
the shared vocabulary) which are 'accepted as reasonable' by all 
members of a community, so that they consider semantic notions such 
as entailment to be restricted only to interpretations which extend 
one or more of those shared interpretations.

To make this precise we need to define a notion of 'relative 
entailment'.  Suppose COM is a set of interpretations. Then E entails 
E' relative to COM just in case every interpretation which is 
compatible with an interpretation in COM and satisfies E, also 
satisfies E'. What this does, in effect, is to rule out 
interpretations that are inconsistent with everything in COM as 
possible counterexamples to a claim of valid entailment, thereby 
making it 'easier' for one expression to entail another. Intuitively, 
E does not have to explain so much in order to make E' follow from 
it, since the content expressed by the restriction to compatibility 
with a member of COM can be taken for granted in the communication.

Clearly, if C is a subset of D then validity relative to C implies 
validity relative to D. Ordinary entailment is then the special case 
where COM is empty, i.e. the strongest kind of entailment. (Note that 
COM is a set of interpretations, not an expression, so the question 
of how the shared content is expressed is here left open.) As an 
example, consider the set of URIs that a browser could utilize to 
locate a web page, consider the interpretation ADDR which maps each 
such uriref to the web page it locates according to the hypertext 
transfer protocol (and to the string "404error" if there is no such 
web page, say) and define HTTP-COM to be {ADDR}. Then entailment 
relative to HTTP-COM does not permit arbitrary re-interpretations of 
uri's starting with 'http://'.

We will call such a 'shared' set of interpretations an interpretation 
core, or simply a core. The exact analysis of what constitutes the 
interpretation core for a community is beyond the scope of this 
document, but we will use the idea when extending the model theory to 
rdfs.

This idea can be extended to describe assumptions shared between 
communities. If one community's core is S and the other is using 
entailment relative to T, then entailment relative to the 
intersection of S and T represents their 'common core' of agreed 
content. In the worst case this will be empty, and they will only 
have logical validity in common. (Actually, the worst case in 
practice is likely to be represented by the interpretation core of 
rdfs, see later sections.)

The lemmas in section 6 above need to be stated more carefully when 
applied to relative entailment, since the conclusions of relative 
entailments can be derived in part from the core, in ways that are 
not easy to characterize.  For example, it may be that the core 
interpretations impose a condition that requires two urirefs to 
denote the same resource, so that anything said using one would also 
be true if said using the other. Such an entailment would be 
relatively valid but might have no syntactic trace in the expressions 
themselves. Right now (7/27/01) I am not quite sure how to approach 
studying this aspect of relative validity.

9. rdfs interpretations
(to be written)

10. reification and containers
(to be written)

11. relative and absolute URIs.
(to be written)

-------------------------------
Appendix: proofs
Lemma 1. Suppose E and E' are sets of triples. Then E entails E' iff 
E  contains E'.
Proof.
If is trivial. For only if, suppose that  (a b c) is a triple in E' 
which is not in E, and let I be an interpretation satisfying E. We 
will construct an interpretation I' which satisfies E but not E'. I' 
is like I except that I'EXT(I(b)) does not contain <I(a), I(c)>. 
Clearly I' does not satisfy E'. However, I(e)=I'(e) for all triples 
except (a b c), and I satisfies E, so I' satisfies E.
QED.

Lemma 2. Suppose E is a document. Then v(E) entails E.
Proof (trivial).

Lemma 3. (Proof to be written.)

---------------------------------------------------------------------
(650)859 6569 w
(650)494 3973 h (until September)
phayes@ai.uwf.edu 
http://www.coginst.uwf.edu/~phayes

Received on Friday, 27 July 2001 17:49:21 UTC