- From: pat hayes <phayes@ai.uwf.edu>
- Date: Thu, 26 Jul 2001 11:48:10 -0700
- To: Brian McBride <bwm@hplb.hpl.hp.com>
- Cc: rdf core <w3c-rdfcore-wg@w3.org>
- Message-Id: <v04210100b78611e98346@[130.107.66.237]>
(This modifies the model theory which Brian McBride circulated on
Mon, 23 Jul 2001 08:02:45 +0100 in a few small but important ways.
Sorry it is late.)
-----------------------------------------------
RDF Model Theory StrawDog proposal
Pat Hayes, IHMC
In what follows, 'N-triples' refers to Dave Beckett's version 5,
relevant parts of which are:
ntripleDoc ::=line*
line ::=ws* (comment | blankline | triple) eoln
triple ::=subject ws+ predicate ws+ object ws* '.' ws*
subject ::=uriref | anonNode
predicate ::=uriref
object ::=uriref | anonNode | qLiteral
1. Introduction
The basic idea is to re-state the M&S notions of what a set of RDF
triples means in mathematical terms, making as few extra or
gratuitous assumptions as possible. By keeping things as general as
possible, any restrictions that someone might want to impose can be
captured by further restrictions on the model theory.
We assume that:
a. there is no restriction on what a resource can be
b. there is no restriction on the domains and ranges of properties;
in particular, a property may be applied to itself.
c. all nodes in an RDF graph play fundamentally the same role in the
language no matter how they are labelled (for comments on anonymous
nodes, see later.)
2. Interpretations
First, we assume a global set LV of literal values and a fixed
mapping XL from the set of qLiterals to LV. All interpretations will
be required to conform to XL on qLiteral expressions.
<comment> This leaves open the question of the exact nature of
literals, their language-sensitivity and so on, while acknowledging
their special status as expressions with a 'fixed' interpretation.
</comment>
All interpretations will be relative to a set of RDF nodes, called
the vocabulary of the interpretation, so that one has to speak,
strictly, of an interpretation of an RDF 'nodespace' or of an RDF
graph, rather than of RDF itself.
This is stated in terms of RDF nodes and graphs to follow the M&S.
For N-triples, we can identify N with a set of expressions that would
be mapped to nodes by a parser, so N would be a subset of (uriref
|anonNode)
<comment> Restricting an interpretation to a particular namespace is
normal practice in model theory, since an interpretation mapping is
expected to interpret only a given set of 'names'. It is possible to
adopt a 'global' approach where any interpretation assigns a meaning
to every possible expression, but this makes the subsequent
development more awkward. For example, there are no 'new names' in a
global model theory, since all names have already been given a fixed
meaning by every interpretation. </comment>
An interpretation I (of vocab(I)) is defined by:
1. A nonempty set R of resources, called the domain or universe of I.
<comment> Whatever resources are, this a set of those. </comment>
2. A nonempty subset P of R corresponding to properties
<comment> Note that P could be R itself.</comment>
3. A mapping IEXT: P -> Rx(R union LV) (ie the set of pairs <x,y>
with x in R and y in R or LV)
<comment> IEXT(x) is the extension of x. This allows a property to be
an object and so occur in its own extension, a neat trick I learned
from Chris Menzel. </comment>
4. A mapping IS: vocab(I) -> R
The denotation of an RDF expression E in I is given by the following rules:
>> if E is a <uriref> or <anonNode> then I(E) = IS(E)
>> if E is a <qLiteral> then I(E) = LV(E)
>> if E is an asserted triple with the form s p o
then I(E) = true iff <I(s),I(o)> is in IEXT(I(p)), otherwise I(E)= false.
>> if E is a set of triples then I(E) = false just in case I(E') =
>>false for some asserted triple E' in E, otherwise I(E) = true.
(Note that if E contains no asserted triples then I(E)= true. This is
the usual convention: an empty conjunction is considered to be true
since it contains nothing that would cause it to be false. This makes
empty assertions have vacuous content, as one would expect.)
Comments and blanklines have no semantics, of course.
The use of the phrase "asserted triple" is a deliberate
weasel-worded artifact, to allow an RDF graph/document to contain
triples which are being used for some non-assertional purpose (such
as expressing a query, implementing part of some expression of
another language, or forming part of a pattern intended to be matched
against some asserted RDF in another document, or just being
breezy:-). We may wish to say that strict conformity to the RDF M&S
assumes that all triples in a document are asserted triples, but
making the distinction allows RDF parsers and engines such as CWM and
Euler to conform to the RDF syntax and to respect the RDF model
theory without necessarily being fully committed to it. Also, I
expect that future extensions or versions of RDF will require such
non-asserted triples, and this allows the model theory to be extended
in those future directions.
(We will give examples of interpretations in an abbreviated format
simply by listing R, IEXT and IS in the form R/IEXT/IS. P is the
domain of IEXT. For example, for the vocabulary {a b c} the following
is a small interpretation:
{1 2}/1->{<1,1>,<2,2>}/a->1,b->1,c->2
which makes the set
a b a
c a c
true, and all the triples
a c b
a b c
c c c
false)
3. Anonymous nodes as existential assertions
This notion of interpretation treats anonymous nodes exactly like
URIs, semantically speaking. It amounts to adopting the view that an
anonymous node is equivalent to a node with an unknown URI. However,
we could adopt an extra clause which would treat an anonymous node
like an existentially quantified variable. To do so requires that we
decide on the 'scope' of the quantification. The easiest convention
to state would be that the scope was the triple, but that seems
unworkable since the same anonymous node may occur in several triples
in a given document. To make the quantification have the scope of the
entire document requires introducing the notion of 'document' into
the syntax, however. I am not sure how to do this in the RDF graph
model, but in N-triples I will take it that the appropriate syntax
class is <ntripleDoc>.
This will require some definitions. If I is an interpretation and A
is a mapping from some set of anonNodes to the domain of I, then
define I[A] to be the interpretation which is like I except that
I[A](x)=A(x) wherever A is defined, ie I[A] uses the A-mapping to
assign values wherever it can, and otherwise is identical to I. If E
is an <ntripleDoc>, define anon(E) to be the set of anonymous nodes
in E, and set(E) to be the set of asserted triples in E. Then we have
the extra interpretation rule:
>> If E is an <ntripleDoc> then I(E) = true if I[A](set(E))=true for
>>some A defined on anon(E), otherwise I(E)= false .
This effectively treats all anonymous nodes as existentially
quantified at the boundary of the document containing them. This
requires us to make a conceptual distinction between a document
(containing anonNodes) and the set of triples in the document; the
anonymous nodes are free in the set, but bound by an implicit
existential quantifier in the document.(Notice that if the document E
contains no anonNodes, then I(E)=I(set(E)) ) The distinction becomes
important when we combine documents, since the result of unifying two
documents may not have the same meaning as combining the two sets of
triples into a single document.(The difference will be significant
only if the two documents share an anonymous node.If this is
gauranteed to be impossible, then we do not need to make the
distinction; still, however, it will not be true in general that any
document is equal to the union of the documents formed by dividing it
up into subsets.) We will use the neutral term 'RDF expression' to
refer to a triple, a set of triples, or a document.
It is convenient to define the vocabulary of an RDF expression to be
the set of 'names' in it which are 'free'. The vocabulary of a triple
is the set of <uriref>s and <anonNode>s in the triple; the vocabulary
of a set of triples is the union of the vocabularies of the triples
in the set; and the vocabulary of a document E is the set of
<urirefs> in the vocabulary of set(E). The anonNodes of a document
are bound and are not part of the vocabulary. (The correspondence to
the earlier usage of this word is that the vocabulary of an
interpretation of an expression must contain the vocabulary of the
expression in order for the interpretation to give a well-defined
value to the expression.)
4. Terminology
We say that I satisfies E if I(E)=true, and that E entails E' iff
every interpretation which satisfies E also satisfies E'.
More subtle relationships require some definitions. (These are
standard mathematical definitions for arbitrary mappings, given here
for completeness.) If M is a subset of the vocabulary of I, define
I\M to be the restriction of I to M, ie I\M(x)=I(x) on M. We will
also say that I is an extension of I\M. If I and I' have the same
restriction to the intersection of their vocabularies, then define
the union I+I' of I and I' to be the interpretation whose vocabulary
is the union of the vocabularies of I and of I', and which agrees
with them on their vocabularies, ie (I+I')(x)=I(x) or =I'(x) wherever
the latter are defined. Notice that if vocab(I) contains no
anonNodes, then I[A] = I+A. Obviously, I satisfies E iff
I\vocabulary(E) satisfies E.
5. Skolemisation
If we replace an anonNode by a uriref, the result is not entailed by
the original document. In fact, in general, if E' contains any
urirefs not contained in E then E cannot entail E'. For example, the
document containing the single triple
<foo> <baz> _:xxx
obviously bears some relation to the triple
<foo> <baz> <bar>
but it does not entail it, since the interpretation
{1,2}/1->{<1,1>}/{foo->1,baz->1,bar->2}
satisfies the first document but not the second. (Notice this
interpretation assigns a value to something - <bar> - outside the
vocabulary of the first document, which is how the entailment fails.)
However, the second triple entails the first document, since any
interpretation which satisfies <foo> <bar> <baz> can be used to make
the first document true by restricting it to the smaller vocabulary
and assigning the interpretation of <bar> to the anonNode to provide
a suitable I[A] mapping. More significantly, any RDF expression which
is entailed by the second triple and does not contain <bar> is also
entailed by the first document.
To see why, suppose E'' is such an expression and I is an
interpretation on {<foo>,<bar>} which satisfies the first document.
Then there must be a mapping A from _:xxx to the universe of I such
that I+A satisfies the first triple. Define I' on the second triple
by I'(<bar>)=A(_:xxx), then I' satisfies the second triple, so it
must satisfy E''; but E'' does not contain <bar>, so it must be
satisfied by I'\{<foo>,<bar>}, which is I; so, I satisfies E''; but I
was arbitrary, so, E'' is entailed by the first document. QED.
This is in fact a general result. We can characterise skolemisation
as follows. Consider a document E and another sk(E) got from E by
replacing all its anonNodes by urirefs drawn from a 'new' vocabulary
V which does not intersect vocab(E). Then sk(E) entails E, and if E'
is any expression with no vocabulary in V and sk(E) entails E', then
E entails E'. (The proof follows exactly the special case given in
the previous paragraph, but takes longer to state because it has to
refer to anon(E).) Notice that since sk(E) contains no anonNodes,
I(sk(E))=I(set(sk(E))) for any I, so there is no need to distinguish
a skolemised document from its set of triples.
The key to proper skolemisation is the use of skolem names which are
guaranteed to never occur in any other document, thereby ensuring
that any other expression in any other document which is entailed by
the skolem form is also entailed by the original existential form.
(The reverse follows trivially.) This is the sense in which anonNodes
used in asserted triples can be replaced by urirefs without any
significant 'loss of content', if the document is considered to be an
assertion.
----------------------------------
<more to come>
---------------------------------------------------------------------
(650)859 6569 w
(650)494 3973 h (until September)
phayes@ai.uwf.edu
http://www.coginst.uwf.edu/~phayes
Received on Thursday, 26 July 2001 14:48:19 UTC