RDF Model Theory Proposal
(postF2F) DRAFT
Draft 8/15/01
Pat Hayes, IHMC
0. N-triples and RDF graphs.
In what follows, ‘ the M&S’
refers to the document “Resource Description Framework (RDF) Model and Syntax Specification”
( http://www.w3.org/TR/REC-rdf-syntax
), ‘RDFS spec’ refers to the document “Resource Description
Framework (RDF) Schema Specification 1.0” (http://www.w3.org/TR/2000/CR-rdf-schema-20000327
), and 'N-triples' refers to Dave
Beckett's version 5, relevant parts of which are:
ntripleDoc ::=line*
line ::=ws* (comment | blankline
| triple) eoln
triple ::=subject ws+
predicate ws+ object ws* '.' ws*
subject ::=uriref | anonNode
predicate ::=uriref
object ::=uriref |
anonNode | qLiteral
We understand N-triples and
RDF-XML as lexical notations for describing an RDF graph as defined in the
M&S. An RDF graph is a
labelled directed graph in which nodes are labelled with URIs, qLiterals, or
anonNodes; arcs are labelled with URIs; and each node has a unique label. <comment> Notice that in this definition, even anonymous nodes are thought of as having
unique labels. This is purely a formal convenience. In particular, ononymous
node labels are not URIs and are not considered to be public; their role as
node identifiers is limited to a single graph. (They are analogous to bound
variables in conventional logical syntax. This issue is discussed further in
later sections.) </comment>
The model theory assigns interpretations directly to the graph,
which is taken as being the primary RDF syntax, in the sense that two RDF
documents, in whatever lexical form, are syntactically equivalent if and only
if they map to the same RDF graph. We will refer to this as the 'graph syntax'
to avoid ambiguity, since the bare term 'syntax' is often assumed to refer to a
lexicalization.
A graph which is like an RDF graph except that it violates the
last condition, ie has two or more nodes with the same label, will be called an
untidy graph. Untidy graphs may arise
as a result of merging operations or be intermediate states in various
operations on graphs, but we will assume that all RDF graphs are tidy unless
otherwise noted. In graph syntax, an untidy graph is considered to be
syntactically ill-formed.
An RDF graph will be
said to be ground if it contains no anonymous
nodes. The vocabulary of a graph is the set of URIs
that it contains.
A graph can be viewed as a set of triples corresponding to its
node-arc-node edges; the correspondence between an ntripleDoc and a tidy RDF
graph is that the graph has one node for each uriref, anonNode or qLiteral in
the document, and one edge for each Ntriple triple, directed from the node
labelled with the subject to the node labelled with the object and with the arc
labelled with the predicate.
Notice that this requires that all occurrences of a particular uriref or anonNode label in an
N-triples document be mapped to a single node in the corresponding graph. Other RDF lexicalisations may use other means of indicating
the graph structure; for our purposes, the important syntactic properties of
RDF graphs is that distinct nodes in an RDF graph are treated as distinct
referring entities in the graph syntax.
1. Interpretations
The basic idea is to re-state the M&S notions of the meaning
of an RDF graph in mathematical terms, making as few extraneous or gratuitous
assumptions as possible. By keeping things general, any restrictions that
someone might want to impose can be captured by further restrictions on the
model theory.
We assume that there is no restriction on what a resource can be,
and that there is no restriction on the domains and ranges of properties; in
particular, a property may be applied to itself. (When classes are introduced
in RDFS, we will assume that they can contain themselves.)
First, we assume a global set LV of literal values and a fixed
mapping XL from the set of qLiterals to LV. All interpretations will be
required to conform to XL on Literal nodes.
This leaves open the question of the exact nature of literals,
their language-sensitivity and so on, while acknowledging their special status
as expressions with a 'fixed' interpretation.
All interpretations will be relative to a set of URIs, called the vocabulary of the interpretation, so that one has to speak, strictly, of an
interpretation of an RDF 'nodespace' or of an RDF graph, rather than of RDF
itself.
<comment> ( An earlier version of the model theory included
anonymous nodes in the vocabulary of a set of triples, but it seems clearer
to omit them when considering
graphs. ) </comment>
For a lexicalized version of the language, we can think of the
vocabulary of an interpretation, more traditionally, to be a subset of the
URI-indicating expressions used by that lexicalization; eg for N-triples, we can consider the
vocabulary of I to be a set of <urirefs>.
An interpretation I (of
vocab(I)) is defined by:
1. A nonempty set IR of resources, called the domain or universe
of I.
<comment> Whatever resources are, this a set of those. IR
may intersect LV, or it may not.
We leave the question open. </comment>
2. A subset IP of IR
corresponding to properties
<comment> Note that
IP could be IR
itself.</comment>
3. A mapping IEXT from
IP into the powerset
of IRx(IR union LV) (ie the set of
sets of pairs <x,y> with x in IR and y in IR or LV)
<comment> IEXT(x) is a set of pairs, ie a binary relational
extension, called the extension of x. This trick of distinguishing a relation as an object from its
relational extension allows a property to occur in its own extension, a neat
trick I learned from Chris Menzel. </comment>
<comment> The restriction of subject denotations to IR
follows the BNF for N-triples. We could change this if people feel it would be more appropriate to allow
subjects to be qLiterals. </comment>
4. A mapping IS:
vocab(I) -> IR
<comment> When considering RDFS we will require interpretations to have extra structure.
</comment>
The denotation I(E) of a ground RDF graph E
in I is given by the following rules:
>> if E is a <qLiteral> then I(E) = XL(E)
>> if E is a <uriref> then I(E) = IS(E)
>> if E is an asserted triple with the form <s p o >
then I(E) = true iff <Is),I(o)> is in IEXT(I(p)), otherwise I(E)= false.
>> if E is a ground RDF graph then I(E) = false just in case
I(E') = false for some asserted triple E' in E, otherwise I(E) = true.
Comments and blanklines have no semantics, of course.
The use of the phrase
"asserted triple" is a deliberate weasel-worded artifact, to allow an
RDF graph/document to contain triples which are being used for some
non-assertional purpose (such as expressing a query, implementing part of some
expression of another language, or forming part of a pattern intended to be
matched against some asserted RDF in another document, or just being breezy:-).
We may wish to say that strict conformity to the RDF 1,0 M&S assumes that
all triples in a document are asserted triples, but making the distinction
allows RDF parsers and engines such as CWM and Euler to conform to the RDF
syntax and to respect the RDF model theory without necessarily being fully
committed to it. Also, I expect that future extensions or versions of RDF will
require such non-asserted triples, and this allows the model theory to be
extended in those future directions.
For example, for the vocabulary {a b c} the following is a small
interpretation, where we use integers to indicate the ‘things’ in
the universe:
IR= {1, 2}; IP = {1}
IEXT: 1->{<1,2>,<2,1>}
IS: a->1,b->1,c->2
which makes these triples true:
a b c
c a a
c b a
and these triples false:
a c b
a b b
c a c
For example, I(<a b c>) = true if <I(a),I(c)> is in
IEXT(I(b)), ie if <1,2> is in IEXT(1), which is {<1,2>,<2,1>}
and so does contain <1,2> and so I(<a bc>) is true.
I(<a c b>) = true if <I(a),I(c)>, ie <1,2> is in
IEXT(I(c)); but I(c)=2 and IEXT is not defined on 2, so the condition fails and
I(<a c b>) = false.
Exercise: Calculate the truthvalues of
the triples
c a c
b b b
c b a
in this interpretation.
More advanced exercise: Write six
distinct triples in this vocabulary and define another interpretation which
makes three of them true and the other three false.
2. Anonymous nodes as
existential assertions
We could treat anonymous nodes exactly like URIs, semantically
speaking, by extending the IS mapping to include them as well as URIs. That
would amount to adopting the view that an anonymous node is equivalent to a
node with an unknown URI. However,
it seems to be more in conformance with the M&S to treat anonymous nodes as
existentially bound variables. This will require some definitions, as the
theory so far provides no meaning for anonymous nodes.
Suppose I is an interpretation and A is a mapping from some set of
anonymous nodes to the domain of I, and define I+A to be an extended
interpretation which is like I except that it uses A to give the interpretation
of anonymous nodes. Define anon(E)
to be the set of anonymous nodes in E.
Then we can extend the above rules to include the two new cases that are
introduced when anonymous nodes occur in the graph:
>> If E is an anonymous node than [I+A](E) = A(E)
>> If E is an RDF graph then I(E) = true if [I+B](E) = true
for some B defined on anon(E), otherwise I(E)= false.
This effectively treats all anonymous nodes as existentially
quantified in the RDF graph in which they occur. Notice that since two nodes
cannot have the same label , there is no need to specify the 'scope' of the
quantifier within a graph. (However, it is local to the graph.) If we were to apply the semantics directly
to N-triples syntax, we would need to indicate the quantifier scope, since in
this lexicalisation syntax the same anonNode identifier may occur several times. The above rule amounts to the
N-triple convention that would place the quantifiers just outside, ie at the outer edge of, the
<ntripleDoc>ument corresponding to the graph.
Notice that we have not changed the definition of an
interpretation. The same interpertation that provides a truth-value for ground
graphs also assigns truth-values to graphs with anonymous nodes, even though it
provides no interpretation for the anonymous nodes themselves. Notice also that the anonymous nodes themselves
are, qua nodes, perfectly well-defined entities with a robust notion of
identity; they differ from other nodes only in not being assigned a
model-theoretic interpretation.
Exercise: Consider the RDF graph defined by the following triples:
_:x a b
c b _:x
_:x a c
Is this graph true or false in the interpretation given in the
previous exercise? How about the graphs formed from just the first triple, and
from just the last two triples?
2a. Comparison with formal logic
( Readers unfamiliar with formal logic can skip this section. )
With this semantics, an RDF graph can be translated directly into
a logical expression with the same meaning.
Each node labelled with a URI translates to a constant symbol
which is its label ; each anonymous node to a distinct variable. Each arc (triple)
labelled with <s p o> maps
to an atomic assertion of the form p(s, o); a graph translates to the the
existential closure of the conjunction of all the arcs in the graph.
For example, the graph defined by the triples
_:x a b
c b _:x
_:x a c
translates to the logical expression (written in KIF syntax)
(exists (?y)(and (a ?y b)(b c ?y)(a c ?y)))
Notice that the resulting expression may contain the same symbol
in both a relation and object position (eg ‘b’ in this example). This
is considered syntactically illegal in many versions of logic, but is
acceptable in the recent version of KIF (Hayes & Menzel 2001). To map to a
more conventional logical syntax, translate the triple <s p o> into the
form holds(p, s,o), where ‘holds’ is a three-place relation used to
assert that a binary relation holds between two arguments.
3. Terminology and some basic results
We say that I satisfies E if I(E)=true, and that a set S of expressions entails E if every interpretation which satisfies every member of S also satisfies E'. If {E} entails E' we will say that E
entails E'. Any process or technique which constructs a graph E from some other
graph(s) S is valid iff S always entails E, otherwise invalid.
<comment> Being an invalid process does not mean that the
conclusion is false, and being valid does not guarantee truth. However, it
represents the best guarantee that
any assertional inference system can offer: if given true inputs, it will never
draw a false conclusion from them. </comment>
In this section we give a few basic results about entailment in
RDF. (Note, these results do not take account of extra semantic restrictions
imposed by RDFS and some uses of containers; they apply to 'pure' RDF
entailment only.) Since the language is so simple, entailment can be recognized
by relatively simple syntactic comparisons. The two basic forms of valid proof
step in RDF are, in logical terms, the inference of P from (P and Q), and the inference of (exists
(?x) (foo ?x)) from (foo baz).
Proofs are given in an appendix.
Conjunction Lemma. If E is
ground, then I satisfies E iff it satisfies every triple in E. ***
I.e. a ground graph is equivalent in meaning to the logical
conjunction of its component triples.
To give a syntactic characterization of entailment we will need to
define some relationships between RDF graphs. If E is an RDF graph, say
that E' is a subgraph of E when every node and arc in E' are also in E. This
corresponds to taking a subset of
the triples constituting the graph. Obviously any subgraph of a tidy graph is tidy.
The following is an immediate consequence of the conjunction
lemma:
Subgraph Lemma. If E and E' are ground, then
E entails E' iff E' is a subgraph of E.
***
Herbrand Lemma. Any RDF graph has a satisfying interpretation. ***
This means that there is no such thing as an inconsistency or a
contradiction in RDF, which is not surprising since the language does not
contain negation.
Say that E' is an instance of E (with respect to a set V of URIs) when E' is like E except
that some of the anonymous nodes in E have been labelled with members of
V. Obviously if E is ground then
it is its only instance. Notice that an instance may be untidy. .A tidy ground instance with respect to any set which is disjoint
from vocab(E) is a skolemisation of E.
Skolemisation Lemma. Suppose sk(E) is a skolemisation of E
with respect to V. Then sk(E) entails E; and if sk(E) entails F and vocab(F) is
disjoint from V, then E entails F .
***
Suppose S is a set of
RDF graphs, then their merge is the graph
obtained by taking the union of the sets of all their triples and tidying the
resulting set, i.e. identifying
nodes labelled with the same URI.
Merging lemma. The merge of a
set S is entailed by S, and entails every member of S. ***
Notice that anonymous nodes are not identified with other nodes in
the merge, and indeed this reflects a basic principle of RDF graph inference:
nodes with the same URI must be identified
(ie the graph must be tidy), but anonymous nodes must not be identified with other nodes or relabelled with URIs, in order
to ensure that the resulting graph is entailed by what one starts with. Note that one does not, in general,
obtain the merge of a set of graphs by concatenating their corresponding N-triples
documents and constructing the graph described by the merged document, since if
some of the documents use the same anonNode identifier, the merged document
will describe a graph in which some of the anonymous nodes have been
merged.
<comment> At the time of writing (8/8/01) I am not certain
whether an analogous warning is needed for RDF-XML documents. </comment>
Anonymity lemma 1. Suppose E' is
like E except that at least one anonymous node in E is labelled with a URI in
E'. Then E does not entail E'. ***
Anonymity lemma 2. Suppose that
E' is like E except that two distinct anonymous nodes in E have been identified
in E'. Then E does not entail E'. ***
This means that there is no valid RDF inference process which can
produce an RDF graph in which a single anonymous node occurs in triples originating
from several different graphs. (Of course, such a graph can be constructed; but
it will not be entailed by the
original documents. It must reflect the addition of new
information about the identity of two anonymous nodes. Such information can be
represented in many other ways, eg in DAML+OIL, but it cannot be represented in RDF itself.)
The main result is:
Interpolation Lemma.. S entails E
iff a subgraph of the merge of S
is an instance of E. ***
The interpolation lemma completely characterizes RDF entailment in
syntactic terms. To tell whether a
set of RDF graphs entails another, find a subgraph of their merge and replace
URIs by anonymous nodes to get the second. Of course, there is no need to actually
construct the merge. If working backwards from the consequent E (the graph that
may be entailed by the others), the most efficient technique would be to
treat anonymous nodes as variables
in a process of subgraph-matching, allowing them to bind to'matching' URI
labels in the antecedent graph(s) in S, ie those which may entail the
consequent graph. The interpolation lemma shows that this process is valid, and
is also complete if the subgraph-matching algorithm is. The existence of complete subgraph-checking
algorithms also shows that RDF is decideable, ie there is a terminating
algorithm which will determine for any finite set S and any expression E, whether or not S
entails E.
Notice however that such a
variable-binding process would only be appropriate when applied to the
conclusion of a proposed entailment. If an RDF document is asserted, then it
would be invalid to bind new values to any of its anonymous nodes: see section
5.
4. RDFS interpretations
RDF Schema (RDFS) introduces a
special RDF vocabulary, with some
fixed constraints on its interpretation; in particujlar, the notion of a
'class', so we will need to assume that the universe of interpretations
contains classes as elements. To avoid possible complexities which could arise
from sets containing themselves, etc.,
we will use the same technique used earlier to relate properties to
their extensions, and require every interpretation to include a mapping
from a subset IC of the domain
(consisting of classes) to subsets of the domain which are the 'extensions' of
the classes. This new mapping will
be called ICEXT (for the Class Extension in I) , and since literals can be
contained in classes, we will assume that ICEXT maps IC to subsets of (IR union
LV) . An expression denoting a class will be interpreted via ICEXT when
determining membership in the class.
An rdfs interpretation is then
an RDF interpretation plus one extra mapping, forming a fifth condition on an
interpretation:
5. A subset IC of IR,
containing classes
6. A mapping ICEXT
from IC to the powerset of
(IR union LV) , ie the set of subsets of elements of IR or XL.
<comment> This use of a
class extension mapping allows classes to contain themselves as members. For
example, it is quite OK for a 'universal' class to contain itself as a member,
a convention that is often adopted
at the top of a classification heirarchy. This convention is often thought unacceptable since it seems
to violate one of the axioms of standard (Zermelo-Fraenkel) set theory, the
axiom of foundation (which forbids infinitely descending chains of
membership). The semantic model
given here distinguishes between
the class as an object and its class extension (the set of things that are in
the class), thereby allowing the extension of a class to contain the class itself without
violating the axiom of foundation.
Notice that the question of
whether or not a class contains itself as a member is quite different from the
question of whether or not it is a subclass of itself. </comment>
An rdfs interpretation I satisfies the following conditions, in
addition to those for an rdf interpretation given earlier. (The complete model
theory is summarised in a table in an appendix.)
>> ICEXT(I(rdfs:Resource))
= IR
>> ICEXT(I(rdf:Property))
= IP
>> ICEXT(I(rdfs:Class))
= IC
>>
ICEXT(I(rdfs:Literal)) = LV
<comment> This does not
require that IP and IC be disjoint. However, it would be consistent to assume
that they were. </comment>
>> <x,y> is in
IEXT(I(rdf:type)) iff x is in ICEXT(y)
>> <x,y> is in
IEXT(I(rdfs:subClassOf)) iff ICEXT(x) is a proper subset of ICEXT(y)
>> <x,y> is in
IEXT(rdfs:subPropertyOf)) iff IEXT(x) is a proper subset of IEXT(y)
<comment> The subsets
referred to in subPropertyOf are sets of pairs. </comment>
<comment> The M&S
seems to imply that the proper condition on subClass and subProperty is that
the extensions are *proper* subsets, ie the identity case is disallowed, and
hence 'subclass loops' are illegal. If we decide to change this to allow 'subclass loops', as assumed by
DAML+OIL, then "proper" should be deleted from these conditions.
</comment>
>>
I(rdfs:ConstraintResource) is in IC
>> ICEXT(I(rdfs:ConstraintProperty))
is the intersection of ICEXT(I(rdfs:ConstraintResource)) and IP
<comment> Maybe this
should actually be a subset of the intersection. I am not sure what the wording in the RDFS1.0
spec implies. </comment>
>> I(rdf:range) and
I(rdf:domain) are in
ICEXT(I(rdfs:ConstraintProperty))
>> If <x,y> is in
IEXT(I(rdf:range)) and <u,v>
is in IEXT(x) then v is in ICEXT(y)
>> If <x,y> is in
IEXT(I(rdf:domain)) and
<u,v> is in IEXT(x) then u is in ICEXT(y)
<comment> This reflects
our current understanding of multiple domain and range restrictions rather than
the wording in the M&S. </comment>
(The model theory imposes no
conditions on the interpretations of rdfs:seeAlso, rdfs:isDefinedBy,
rdfs:comment or rdfs:label.)
8. Reification
RDF uses rdfs to define two
important extensions to the basic language: reification (which allows RDF to
make assertions about RDF statements) and the introduction of classes of
containers. These impose extra
conditions on interpretations, requiring their domains to include particular
classes of structures.
We will say that an
interpretation I reifies a set of nodes V if it is capable of describing all RDF triples
that can be constructed from those nodes. (The RDF M&S does not provide any
vocabulary for describing graphs.) When considering lexical forms of the
language, it would be natural to think of V as a set of lexical labels which
identify nodes, eg a subset of( <uriref> union <qLiteral> union
<anonNode>) in N-triples. For lexicalisations such as XML-RDF which do
not provide lexical identifiers corresponding to all anonymous nodes, V must be
allowed to contain arbitrary numbers of ‘blank’ node tokens as well
as URIs and literals.)
In order to be reified, the
interpretation must contain in its universe entities corresponding to all the
nodes (or names) in V and all the
triples that can be constructed from them. Given V, therefore, we say that I reifies V just when there is a mapping
REIF which we will call a reification mapping, from (V union V x V x V) into IR which satisfies the following conditions:
>> <x,y> is in
IEXT(I(rdf:subject)) iff for some a, b, and c in V, x= REIF(<a b c>) and y= REIF(a)
>> <x,y> is in
IEXT(I(rdf:predicate)) iff for some a, b and c in V, x=REIF(<a b c>) and
y= REIF(b)
>> <x,y> is in
IEXT(I(rdf:object)) iff for some a, b and c in V, x=REIF(<a b c>) and y=
REIF(c)
>> x is in ICEXT(I(rdf:Statement)) iff for
some a, b and c in V, x=REIF(<a b c>)
This is deliberately stated (in
the spirit of metaphysical agnosticism appropriate to model theory) so as to not assume anything about the real
nature of the entities in the universe that play the role of reified
statements, and follows the M&S (section 4.1) which distinguishes between
RDF assertions and their ‘modelling’ in the semantic domain. However,
it would be quite consistent to assume that nodes and triples were themselves
actual semantic objects in the universe, and take REIF to be the identity
mapping. The conditions above
could then be stated more directly:
>> (V union VxVxV ) is a
subset of IR
>> <x,y> is in
IEXT(I(rdf:subject)) iff for some a, b, and c in V, x=<a b c>and y= a
>> <x,y> is in
IEXT(I(rdf:predicate)) iff for some a, b and c in V, x=<a b c>and y= b
>> <x,y> is in
IEXT(I(rdf:object)) iff for some a, b and c in V, x=<a b c>and y= c
>> x is in ICEXT(I(rdf:Statement)) iff for
some a, b and c in V, x=<a b c>
We will call this simple
reification
and take it as the normal interpretation of reification in the model theory. (Nothing
would be substantially altered, however, by the use of an explicit reification
mapping, so the other alternative is available if anyone wishes to insist that
the syntactic and semantic domains should not overlap, and impose a conceptual
distinction between RDF expressions and the things that model them.)
Reification is fairly simple,
in this account. However, notice that having x (or REIF(x) ) in the domain does
not in itself provide any access to the meaning of x in the interpretation; in
particular, it provides no way to assert x. This is illustrated by fact that
none of the above equations mention I(x) when x is in IR. The language provides
no way to *use* (as opposed to mention) a reified triple.
Adding such an ability (sometimes
called ‘downward reflection’ or ‘dequoting’) would
considerably increase the expressive power of RDF; it would in effect provide
the language with a form of truth predicate. On the other hand, such devices need
to be treated with considerable care in order to avoid self-referential
paradoxes.
9. Containers
RDFS provides a general class
of ‘containers’ and three particular kinds of container: bags,
sequences and ‘alternatives’. Alternatives pose special problems
which require re-defining part of the basic RDF model theory, and will be
described in the next section. We note that also that the ‘distributive
reference’ mechanisms which use rdf:aboutEach, described in M&S
section 3.3, are part of the XML-RDF serialization rather than the RDF graph
syntax, so are not considered here. (It would be relatively easy to include
them, however.)
Bags and sequences are very
similar except that bags are considered to be unordered. The natural way to
describe this mathematically is to think of bags as sequences with an
equivalence relation defined on them, and identify a bag with an equivalence
class of sequences. This amounts to saying that bag identity can be checked by
allowing permutations, but otherwise a bag is much like a sequence: in
particular, the members of a particular bag do in fact occur in a particular
order in the bag, even though that order is not considered germane to questions
of bag identity. RDFS provides essentially the same machinery for accessing the
elements of bags and of sequences, for example. For all these reasons, we will
take an abstract notion of sequence, called here a series, to be the ‘basic’
container type for RDF/Ss. A series
(over S) is defined to be a partial function from the integers to S. If s is a
series then s(n) is the n-th element of s; if s(n) is undefined then we will
write s(n) = ??. <comment> ( ‘??’ can be taken to be a value,
but it must be a special value which is not in the universe of any
interpretation.) </comment>
We will assume that there is a
distinguished class of things called containers with subclasses called bags and
seqs, and that containers are series on (IR union LV). Again, this allows such apparent
oddities as containers which contain themselves, or two containers each of
which contains the other. The set of containers-series will be called IS.
There is a special class of
container membership properties denoted by ‘rdf:_1’, ‘rdf:_2’,
etc., each of which selects a member from a position in a container. In a
familiar abuse of notation, we will write ‘rdf:_n’ to refer to
these, where ‘n’ is a parameter ranging over the natural numbers/numerals.
>> ICEXT(I(rdfs:Container))
= IS, a subset of IR
>> ICEXT(I(rdf:Bag)) = a
subset of IS disjoint from ICEXT(I(rdf:Seq))
>> ICEXT(I(rdf:Seq)) = a
subset of IS disjoint from ICEXT(I(rdf:Bag))
>> ICEXT(I(rdfs:ContainerMembershipProperty))
= CMIP, a subset of IP
>> I(rdf:_n) is in CMIP
>> <x,y> is in
IEXT(I(rdf:_n)) iff x is in IS
and x(n)=y
Although the model theory does
not need to assume this, it seems natural to assume that the series
corresponding to RDF sequences are finite, ie that there is some integer
length(s) such that for any n >
length(s), s(n) = ??. What is less clear is whether they should be allowed to
have ‘gaps’, ie whether s(n) = ?? for any n <= length(s). The
M&S is not entirely clear on this question, so it is left open in the model
theory. (The issue surfaces when
an RDF graph gives only partial information about a container, e.g. suppose a
sequence is asserted to have a first element which is the string “one”
and a third element which is the string “three”, but no other
information is provided. Should it follow that the container in question has a
second element, even though its value has not been specified?)
The model theory as written
also provides no formal account of what makes a bag really different from a
sequence, other than insisting that they are different. One way to add such an
account would be to impose a restriction on the subset ICEXT(I(rdf:Bag)), so
that it is not permitted to contain two series which are permutations of each
other. This restriction would not alter the truth-values of any piece of RDF
syntax, however, so it is omitted from the formal model theory.
On what may be a related point,
the RDF M&S provides a number of examples of what might be called ‘intended
uses’ for containers, particularly bags, which seem to involve extra
assumptions about the meanings of expressions involving bags, such as the
observation that a triple whose first element is a bag could be considered to
be as assertion about the bag, or about the elements of the bag, depending on
the property being applied. As stated here, the model theory only supports
interpretations in which containers are ‘opaque’ objects, so that
assertions involving containers are about those containers, rather than being
understood distributively to be about the members of the container. Treating
containers as ‘transparent’ requires re-working the basic model
theory somewhat, as described in the next section; it has quite extensive
consequences, requiring the notions of satisfiability and entailment to be
re-considered.
10. Alt
<not yet written />
11. RDFS entailment
<not yet written/>
Appendix 1: Summary of model
theory
RDF:
An interpretation I (of vocab(I)) is defined by:
1. A nonempty set IR of resources, called the domain or
universe of I.
2. A subset IP of IR corresponding to properties
3. A mapping IEXT from IP into the powerset of
IRx(IR union LV)
4. A mapping IS: vocab(I) -> IR
Suppose A is a mapping from
some set of anonNodes to the domain of I. anon(E) is the set of anonymous nodes
in the graph E.
>> if E is a
<qLiteral> then I(E) = XL(E)
>> if E is a
<uriref> then I(E) = IS(E)
>> if E is an asserted
triple <s p o >
then I(E) = true iff <I(s),I(o)> is
in IEXT(I(p)), otherwise I(E)=
false.
>> if E is a ground RDF
graph then I(E) = false just in case I(E') = false for some asserted triple E'
in E, otherwise I(E) = true.
>> if E is an anonymous
node than [I+A](E) = A(E)
>> if E is an RDF graph
then I(E) = true if [I+B](E) = true for some B defined on anon(E), otherwise
I(E)= false.
RDFS:
5. A subset IC of IR,
containing classes
6. A mapping ICEXT
from IC to the powerset of
(IR union LV)
>>
ICEXT(I(rdfs:Resource)) = IR
>> ICEXT(I(rdf:Property))
= IP
>> ICEXT(I(rdfs:Class))
= IC
>>
ICEXT(I(rdfs:Literal)) = LV
>> <x,y> is in
IEXT(I(rdf:type)) iff x is in ICEXT(y)
>> <x,y> is in
IEXT(I(rdfs:subClassOf)) iff ICEXT(x) is a proper subset of ICEXT(y)
>> <x,y> is in
IEXT(rdfs:subPropertyOf)) iff IEXT(x) is a proper subset of IEXT(y)
>>
I(rdfs:ConstraintResource) is in IC
>>
ICEXT(I(rdfs:ConstraintProperty)) is the intersection of IP and
ICEXT(I(rdfs:ConstraintResource))
>> I(rdf:range) and
I(rdf:domain) are in
ICEXT(I(rdfs:ConstraintProperty))
>> If <x,y> is in
IEXT(I(rdf:range)) and <u,v>
is in IEXT(x) then v is in ICEXT(y)
>> If <x,y> is in
IEXT(I(rdf:domain)) and
<u,v> is in IEXT(x) then u is in ICEXT(y)
REIFICATION (simple form):
For some vocabulary V, (V union
VxVxV ) is a subset of IR
>> <x,y> is in
IEXT(I(rdf:subject)) iff for some a, b, and c in V, x=<a b c>and y= a
>> <x,y> is in
IEXT(I(rdf:predicate)) iff for some a, b and c in V, x=<a b c>and y= b
>> <x,y> is in
IEXT(I(rdf:object)) iff for some a, b and c in V, x=<a b c>and y= c
>> x is in ICEXT(I(rdf:Statement)) iff for
some a, b and c in V, x=<a b c>
CONTAINERS
IS is a set of series over (IR union
LV). IS is a subset of IR.
>> ICEXT(I(rdfs:Container))
= IS
>> ICEXT(I(rdf:Bag)) = a subset of IS disjoint from
ICEXT(I(rdf:Seq))
>> ICEXT(I(rdf:Seq)) = a subset of IS disjoint from ICEXT(I(rdf:Bag))
>> ICEXT(I(rdfs:ContainerMembershipProperty))
= CMIP, a subset of IP
>> I(rdf:_n) is in CMIP
>> <x,y> is in
IEXT(I(rdf:_n)) iff x is in IS and
x(n)=y
RDF/RDFS model theory summary |
|
0. Domains and mappings of interpretation I |
|
vocab(I) set
of URIs ; LV set of literal values (global) ; IR set of resources (universe);
IP subset of
IR (properties) ; IC:
subset of IR (classes). |
|
XL: literals
-> LV; IS: vocab(I) -> IR; IEXT: IP -> subsets of (IR x (IR
union LV)); ICEXT: IC
-> subsets of (IR union LV) |
|
1. Basic equations |
|
E is |
I(E) is |
A
literal (qLiteral) |
XL(E) |
A uri (uriref) |
IS(E) |
An asserted
triple <s p o> |
true if
<I(s), I(o)> is in IEXT(I(p)), otherwise false |
Any other
triple |
not defined |
A ground RDF
graph |
false if
I(E’) =false for any asserted triple E’ in E, otherwise
true |
An anonymous
node (anonNode) |
not
defined; but [I+A](E) =
A(E) |
An RDF graph |
true if
[I+A](E) = true for some A: anon(E) -> IR, otherwise
false. |
2. Class extensions |
|
E is |
I(E) is in IC; ICEXT(I(E)) is |
rdfs:Resource |
IR (The universe of
the interpretation) |
rdf:Property |
IP (Properties; subset of IR, domain of
IEXT) |
rdfs:Class |
IC (Classes; subset of IR, domain of
ICEXT) |
rdfs:Literal |
LV (Literal values) |
rdfs:ConstraintResource |
CR (subset of IR) |
rdfs:ConstraintProperty |
CP = IP
intersect CR |
3. Property extensions |
|
E is |
I(E) is in IP; <x,y> is in IEXT(I(E)) iff |
rdf:type |
x is in
ICEXT(y) |
rdfs:subClassOf |
ICEXT(x) is a
proper subset of ICEXT(y) |
rdfs:subPropertyOf |
IEXT(x) is a
proper subset of IEXT(y) |
E is |
I(E) is in CP; <x,y> is in IEXT(I(E)) iff |
rdfs:range |
if
<u,v> is in IEXT(x) then v
is in ICEXT(y) |
rdfs:domain |
if
<u,v> is in IEXT(x) then u
is in ICEXT(y) |
4. Reification (Simple form) |
|
V is a set of
nodes (URIs and/or anonymous nodes and/or literals) (V union
(VxVxV) ) is a subset of IR |
|
E is |
I(E) is in IC; x is
in ICEXT(I(E)) iff |
rdf:Statement |
x= <a b c> where a, b, c are in V |
E is |
I(E) is in IP; <x,y>
is in IEXT(I(E)) iff |
rdf:subject |
x= <a b
c> and y=a where a, b, c are in V |
rdf:predicate |
x= <a b
c> and y=b where a, b, c are in V |
rdf:object |
x= <a b
c> and y=c where a, b, c are in V |
5. Containers |
|
IS is a set
of series over (IR union LV) IS is a subset
of IR |
|
rdfs:Container |
IS |
rdf:Bag |
a subset of
IS disjoint from ICEXT(I(rdf:Seq)) |
rdf:Seq |
a subset of
IS disjoint from ICEXT(I(rdf:Bag)) |
E is |
I(E) is in IC; ICEXT(I(E))
is |
rdfs:ContainerMembershipProperty |
CMIP, a subset of IP |
E is |
I(E) is in CMIP; <x,y> is in IEXT(I(E)) iff |
rdf:_n |
x is in IS
and x(n)=y |
(The model theory imposes no
conditions on the interpretations of rdfs:seeAlso, rdfs:isDefinedBy, rdfs:comment
or rdfs:label.)
Appendix 2: proofs of lemmas.
<not yet written/>
(Herbrand Lemma) Any graph has
a satisfying interpretation.
Proof. If E is a set of
triples, let S, P and O be the sets of urirefs and anonNodes occuring in E in
respectively the subject, property and object position in a triple. (These sets
need not be disjoint). Then define an interpretation I as follows:
IR= S union P union O
IP = P
IEXT(x) = { <y, z> |
<y x z> is in E and z is in
IR} union {<y, LV(z)> | <y x z> is in E and z is in XL}
IS(x) = x on URIs in IR
A(x) = x on anonNodes in IR
Obviously I+A, and hence I,
satisfies E.
QED.