RDF Model Theory Proposal (postF2F) DRAFT

 

Draft 8/15/01

 

Pat Hayes, IHMC

 

0. N-triples and RDF graphs.

 

In what follows, ‘ the M&S’ refers to the document “Resource Description Framework  (RDF) Model and Syntax Specification”  ( http://www.w3.org/TR/REC-rdf-syntax ), ‘RDFS spec’ refers to the document “Resource Description Framework (RDF) Schema Specification 1.0” (http://www.w3.org/TR/2000/CR-rdf-schema-20000327 ), and  'N-triples' refers to Dave Beckett's version 5, relevant parts of which are:

 

ntripleDoc  ::=line*

line        ::=ws* (comment | blankline | triple) eoln

triple      ::=subject ws+ predicate ws+ object ws* '.' ws*

subject     ::=uriref | anonNode

predicate   ::=uriref

object      ::=uriref | anonNode | qLiteral

 

We understand N-triples and RDF-XML as lexical notations for describing an RDF graph as defined in the M&S.  An RDF graph is a labelled directed graph in which nodes are labelled with URIs, qLiterals, or anonNodes; arcs are labelled with URIs; and each node has a unique label.  <comment> Notice that in this definition, even anonymous nodes are thought of as having unique labels. This is purely a formal convenience. In particular, ononymous node labels are not URIs and are not considered to be public; their role as node identifiers is limited to a single graph. (They are analogous to bound variables in conventional logical syntax. This issue is discussed further in later sections.)  </comment>

 

The model theory assigns interpretations directly to the graph, which is taken as being the primary RDF syntax, in the sense that two RDF documents, in whatever lexical form, are syntactically equivalent if and only if they map to the same RDF graph. We will refer to this as the 'graph syntax' to avoid ambiguity, since the bare term 'syntax' is often assumed to refer to a lexicalization.

 

A graph which is like an RDF graph except that it violates the last condition, ie has two or more nodes with the same label, will be called an untidy graph. Untidy graphs may arise as a result of merging operations or be intermediate states in various operations on graphs, but we will assume that all RDF graphs are tidy unless otherwise noted. In graph syntax, an untidy graph is considered to be syntactically ill-formed.

 

An RDF graph  will be said to be ground if it contains no anonymous nodes. The vocabulary of a graph is the set of URIs that it contains.

 

A graph can be viewed as a set of triples corresponding to its node-arc-node edges; the correspondence between an ntripleDoc and a tidy RDF graph is that the graph has one node for each uriref, anonNode or qLiteral in the document, and one edge for each Ntriple triple, directed from the node labelled with the subject to the node labelled with the object and with the arc labelled with the predicate.  Notice that this requires that all occurrences of a particular  uriref or anonNode label in an N-triples document be mapped to a single node in the corresponding graph.  Other RDF lexicalisations may use other means of indicating the graph structure; for our purposes, the important syntactic properties of RDF graphs is that distinct nodes in an RDF graph are treated as distinct referring entities in the graph syntax.

 

 

1. Interpretations

 

The basic idea is to re-state the M&S notions of the meaning of an RDF graph in mathematical terms, making as few extraneous or gratuitous assumptions as possible. By keeping things general, any restrictions that someone might want to impose can be captured by further restrictions on the model theory.

 

We assume that there is no restriction on what a resource can be, and that there is no restriction on the domains and ranges of properties; in particular, a property may be applied to itself. (When classes are introduced in RDFS, we will assume that they can contain themselves.)

 

First, we assume a global set LV of literal values and a fixed mapping XL from the set of qLiterals to LV. All interpretations will be required to conform to XL on Literal nodes.

This leaves open the question of the exact nature of literals, their language-sensitivity and so on, while acknowledging their special status as expressions with a 'fixed' interpretation.

 

All interpretations will be relative to a set of URIs, called the vocabulary of the interpretation, so that one has to speak, strictly, of an interpretation of an RDF 'nodespace' or of an RDF graph, rather than of RDF itself.

 

<comment> ( An earlier version of the model theory included anonymous nodes in the vocabulary of a set of triples, but it seems clearer to  omit them when considering graphs. ) </comment>

 

For a lexicalized version of the language, we can think of the vocabulary of an interpretation, more traditionally, to be a subset of the URI-indicating expressions used by that lexicalization; eg for  N-triples, we can consider the vocabulary of I to be a set of <urirefs>.

 

An interpretation I (of vocab(I)) is defined by:

 

1. A nonempty set IR of resources, called the domain or universe of I.

<comment> Whatever resources are, this a set of those. IR may  intersect LV, or it may not. We leave the question open. </comment>

2. A subset IP  of IR corresponding to properties

<comment> Note that  IP  could be IR itself.</comment>

3. A mapping IEXT from  IP  into the powerset of  IRx(IR union LV) (ie the set of sets of pairs <x,y> with x in IR and y in IR or LV)

<comment> IEXT(x) is a set of pairs, ie a binary relational extension, called the extension of x. This  trick of distinguishing a relation as an object from its relational extension allows a property to occur in its own extension, a neat trick I learned from Chris Menzel. </comment>

<comment> The restriction of subject denotations to IR follows the BNF for N-triples. We could change this if people feel  it would be more appropriate to allow subjects to be qLiterals. </comment>

4. A mapping IS:  vocab(I) -> IR

 

<comment> When considering RDFS  we will require interpretations to have extra structure. </comment>

 

 

The denotation I(E) of a ground RDF graph E in I is given by the following rules:

 

>> if E is a <qLiteral> then I(E) = XL(E)

>> if E is a <uriref>  then I(E) = IS(E)

>> if E is an asserted triple with the form  <s p o >

     then I(E) = true iff <Is),I(o)> is in  IEXT(I(p)), otherwise I(E)= false.

>> if E is a ground RDF graph then I(E) = false just in case I(E') = false for some asserted triple E' in E, otherwise I(E) = true.

 

Comments and blanklines have no semantics, of course.

 

The use of  the phrase "asserted triple" is a deliberate weasel-worded artifact, to allow an RDF graph/document to contain triples which are being used for some non-assertional purpose (such as expressing a query, implementing part of some expression of another language, or forming part of a pattern intended to be matched against some asserted RDF in another document, or just being breezy:-). We may wish to say that strict conformity to the RDF 1,0 M&S assumes that all triples in a document are asserted triples, but making the distinction allows RDF parsers and engines such as CWM and Euler to conform to the RDF syntax and to respect the RDF model theory without necessarily being fully committed to it. Also, I expect that future extensions or versions of RDF will require such non-asserted triples, and this allows the model theory to be extended in those future directions.

 

For example, for the vocabulary {a b c} the following is a small interpretation, where we use integers to indicate the ‘things’ in the universe:

IR= {1, 2}; IP = {1}

IEXT: 1->{<1,2>,<2,1>}

IS: a->1,b->1,c->2

which makes these triples true:

a b c

c a a

c b a

and these triples false:

a c b

a b b

c a c      

 

For example, I(<a b c>) = true if <I(a),I(c)> is in IEXT(I(b)), ie if <1,2> is in IEXT(1), which is {<1,2>,<2,1>} and so does contain <1,2> and so I(<a bc>) is true.

I(<a c b>) = true if <I(a),I(c)>, ie <1,2> is in IEXT(I(c)); but I(c)=2 and IEXT is not defined on 2, so the condition fails and I(<a c b>) = false.

Exercise: Calculate the truthvalues of the triples

c a c

b b b

c b a

in this interpretation.

More advanced exercise: Write six distinct triples in this vocabulary and define another interpretation which makes three of them true and the other three false.

 

2.  Anonymous nodes as existential assertions

 

We could treat anonymous nodes exactly like URIs, semantically speaking, by extending the IS mapping to include them as well as URIs. That would amount to adopting the view that an anonymous node is equivalent to a node with an unknown URI.  However, it seems to be more in conformance with the M&S to treat anonymous nodes as existentially bound variables. This will require some definitions, as the theory so far provides no meaning for anonymous nodes.

 

Suppose I is an interpretation and A is a mapping from some set of anonymous nodes to the domain of I, and define I+A to be an extended interpretation which is like I except that it uses A to give the interpretation of anonymous nodes.  Define anon(E) to be the set of anonymous nodes in E.  Then we can extend the above rules to include the two new cases that are introduced when anonymous nodes occur in the graph:

 

>> If E is an anonymous node than [I+A](E) = A(E)

>> If E is an RDF graph then I(E) = true if [I+B](E) = true for some B defined on anon(E), otherwise I(E)= false. 

 

This effectively treats all anonymous nodes as existentially quantified in the RDF graph in which they occur. Notice that since two nodes cannot have the same label , there is no need to specify the 'scope' of the quantifier  within a  graph. (However, it is local to the graph.) If we were to apply the semantics directly to N-triples syntax, we would need to indicate the quantifier scope, since in this lexicalisation syntax the same anonNode identifier may occur several  times. The above rule amounts to the N-triple convention that would place the quantifiers  just outside, ie at the outer edge of, the <ntripleDoc>ument corresponding to the graph.

 

Notice that we have not changed the definition of an interpretation. The same interpertation that provides a truth-value for ground graphs also assigns truth-values to graphs with anonymous nodes, even though it provides no interpretation for the anonymous nodes themselves.  Notice also that the anonymous nodes themselves are, qua nodes, perfectly well-defined entities with a robust notion of identity; they differ from other nodes only in not being assigned a model-theoretic interpretation.

 

Exercise:  Consider the RDF graph defined by the following triples:

_:x a b

c b _:x

_:x a c

Is this graph true or false in the interpretation given in the previous exercise? How about the graphs formed from just the first triple, and from just the last two triples?

 

2a. Comparison with formal logic

( Readers unfamiliar with formal logic can skip this section. )

 

With this semantics, an RDF graph can be translated directly into a logical expression with the same meaning.

 

Each node labelled with a URI translates to a constant symbol which is its label ; each anonymous node to a distinct variable. Each arc (triple) labelled with  <s p o> maps to an atomic assertion of the form p(s, o); a graph translates to the the existential closure of the conjunction of all the arcs in the graph.

 

For example, the graph defined by the triples

_:x a b

c b _:x

_:x a c

translates to the logical expression (written in KIF syntax)

 

(exists (?y)(and (a ?y b)(b c ?y)(a c ?y)))

 

Notice that the resulting expression may contain the same symbol in both a relation and object position (eg ‘b’ in this example). This is considered syntactically illegal in many versions of logic, but is acceptable in the recent version of KIF (Hayes & Menzel 2001). To map to a more conventional logical syntax, translate the triple <s p o> into the form holds(p, s,o), where ‘holds’ is a three-place relation used to assert that a binary relation holds between two arguments.

 

3. Terminology and some basic results

 

We say that I satisfies E if I(E)=true, and that a set S of expressions entails E if every interpretation which satisfies every member of S  also satisfies E'.  If {E} entails E' we will say that E entails E'. Any process or technique which constructs a graph E from some other graph(s) S is valid  iff S always entails E, otherwise invalid.

 

<comment> Being an invalid process does not mean that the conclusion is false, and being valid does not guarantee truth. However, it represents  the best guarantee that any assertional inference system can offer: if given true inputs, it will never draw a false conclusion from them. </comment>

 

In this section we give a few basic results about entailment in RDF. (Note, these results do not take account of extra semantic restrictions imposed by RDFS and some uses of containers; they apply to 'pure' RDF entailment only.) Since the language is so simple, entailment can be recognized by relatively simple syntactic comparisons. The two basic forms of valid proof step in RDF are, in logical terms, the inference of P from  (P and Q), and the inference of (exists (?x) (foo ?x)) from (foo baz).  Proofs are given in an appendix.

 

Conjunction Lemma. If E is ground, then I satisfies E iff it satisfies every triple in E. ***

I.e. a ground graph is equivalent in meaning to the logical conjunction of its component triples.

 

To give a syntactic characterization of entailment we will need to define some relationships between RDF graphs. If E is an RDF graph, say that  E' is a subgraph of E when every node and arc in E' are also in E. This corresponds to taking a  subset of the triples constituting the graph. Obviously any subgraph of a tidy graph is tidy.

The following is an immediate consequence of the conjunction lemma:

 

Subgraph Lemma. If E and E' are ground, then E entails E' iff E' is a subgraph of E.  ***

 

Herbrand Lemma.  Any RDF graph has a satisfying interpretation. ***

 

This means that there is no such thing as an inconsistency or a contradiction in RDF, which is not surprising since the language does not contain negation.

 

Say that E' is an instance of E (with respect to a set V of URIs) when E' is like E except that some of the anonymous nodes in E have been labelled with members of V.  Obviously if E is ground then it is its only instance. Notice that an instance may be untidy. .A tidy ground instance with respect to any set which is disjoint from  vocab(E) is a skolemisation of E.

 

Skolemisation Lemma.  Suppose sk(E) is a skolemisation of E with respect to V. Then sk(E) entails E; and if sk(E) entails F and vocab(F) is disjoint from V, then E entails F .  ***

 

Suppose S is a set of  RDF graphs, then their merge is the graph obtained by taking the union of the sets of all their triples and tidying the resulting set, i.e. identifying  nodes labelled with the same URI. 

 

Merging lemma. The merge of a set S is entailed by S, and entails every member of S. ***

 

Notice that anonymous nodes are not identified with other nodes in the merge, and indeed this reflects a basic principle of RDF graph inference: nodes with the same URI must be identified (ie the graph must be tidy), but anonymous nodes must not be identified with other nodes or relabelled with URIs, in order to ensure that the resulting graph is entailed by what one starts with.  Note that one does not, in general, obtain the merge of a set of graphs by concatenating their corresponding N-triples documents and constructing the graph described by the merged document, since if some of the documents use the same anonNode identifier, the merged document will describe a graph in which some of the anonymous nodes have been merged. 

<comment> At the time of writing (8/8/01) I am not certain whether an analogous warning is needed for RDF-XML  documents. </comment>

 

Anonymity lemma 1. Suppose E' is like E except that at least one anonymous node in E is labelled with a URI in E'. Then E does not entail E'.  ***

 

Anonymity lemma 2. Suppose that E' is like E except that two distinct anonymous nodes in E have been identified in E'. Then E does not entail E'. ***

 

This means that there is no valid RDF inference process which can produce an RDF graph in which a single anonymous node occurs in triples originating from several different graphs. (Of course, such a graph can be constructed; but it will  not be entailed by the original  documents.  It must reflect the addition of new information about the identity of two anonymous nodes. Such information can be represented in many other ways, eg in DAML+OIL,  but it cannot be represented in RDF itself.)

 

The main result is:

 

Interpolation Lemma.. S entails E iff a subgraph of the merge of S  is an instance of E. ***

 

The interpolation lemma completely characterizes RDF entailment in syntactic terms. To tell  whether a set of RDF graphs entails another, find a subgraph of their merge and replace URIs by anonymous nodes to get the second. Of course, there is no need to actually construct the merge. If working backwards from the consequent E (the graph that may be entailed by the others), the most efficient technique would be to treat  anonymous nodes as variables in a process of subgraph-matching, allowing them to bind to'matching' URI labels in the antecedent graph(s) in S, ie those which may entail the consequent graph. The interpolation lemma shows that this process is valid, and is also complete if the subgraph-matching algorithm is.  The existence of complete subgraph-checking algorithms also shows that RDF is decideable, ie there is a terminating algorithm which will determine for any finite set S and any expression E, whether or not S entails E.

 

Notice however that such a variable-binding process would only be appropriate when applied to the conclusion of a proposed entailment. If an RDF document is asserted, then it would be invalid to bind new values to any of its anonymous nodes: see section 5.

 

4. RDFS interpretations

 

RDF Schema (RDFS) introduces a special  RDF vocabulary, with some fixed constraints on its interpretation; in particujlar, the notion of a 'class', so we will need to assume that the universe of interpretations contains classes as elements. To avoid possible complexities which could arise from sets containing themselves, etc.,  we will use the same technique used earlier to relate properties to their extensions, and require every interpretation to include a mapping from  a subset IC of the domain (consisting of classes) to subsets of the domain which are the 'extensions' of the classes.  This new mapping will be called ICEXT (for the Class Extension in I) , and since literals can be contained in classes, we will assume that ICEXT maps IC to subsets of (IR union LV) . An expression denoting a class will be interpreted via ICEXT when determining membership in the class.

 

An rdfs interpretation is then an RDF interpretation plus one extra mapping, forming a fifth condition on an interpretation:

 

5. A subset IC of IR, containing classes

6.  A mapping ICEXT  from IC to  the powerset of (IR union LV) , ie the set of subsets of elements of IR or  XL.

 

<comment> This use of a class extension mapping allows classes to contain themselves as members. For example, it is quite OK for a 'universal' class to contain itself as a member, a convention that is often adopted  at the top of a classification heirarchy.  This convention is often thought unacceptable since it seems to violate one of the axioms of standard (Zermelo-Fraenkel) set theory, the axiom of foundation (which forbids infinitely descending chains of membership).  The semantic model given here  distinguishes between the class as an object and its class extension (the set of things that are in the class), thereby allowing the extension of a class to contain the class itself without violating the axiom of foundation.

Notice that the question of whether or not a class contains itself as a member is quite different from the question of whether or not it is a subclass of itself. </comment>

 

An rdfs interpretation I  satisfies the following conditions, in addition to those for an rdf interpretation given earlier. (The complete model theory is summarised in a table in an appendix.)

 

>> ICEXT(I(rdfs:Resource)) =  IR

>> ICEXT(I(rdf:Property)) = IP

>> ICEXT(I(rdfs:Class)) =  IC

>> ICEXT(I(rdfs:Literal))  = LV

 

<comment> This does not require that IP and IC be disjoint. However, it would be consistent to assume that they were.  </comment>

 

>> <x,y> is in IEXT(I(rdf:type)) iff x is in ICEXT(y)

>> <x,y> is in IEXT(I(rdfs:subClassOf)) iff ICEXT(x) is a  proper subset of ICEXT(y)

>> <x,y> is in IEXT(rdfs:subPropertyOf)) iff IEXT(x) is a proper subset of IEXT(y)

 

<comment> The subsets referred to in subPropertyOf are sets of pairs. </comment>

<comment> The M&S seems to imply that the proper condition on subClass and subProperty is that the extensions are *proper* subsets, ie the identity case is disallowed, and hence 'subclass loops' are illegal. If we decide to change this to allow  'subclass loops', as assumed by DAML+OIL, then "proper" should be deleted from these conditions. </comment>

 

>> I(rdfs:ConstraintResource) is in IC

>> ICEXT(I(rdfs:ConstraintProperty)) is the intersection of ICEXT(I(rdfs:ConstraintResource)) and IP                                                                  

 

<comment> Maybe this should actually be a subset of the intersection. I am not sure what the wording in the RDFS1.0 spec implies. </comment>

 

>> I(rdf:range) and I(rdf:domain)  are in ICEXT(I(rdfs:ConstraintProperty))

>> If <x,y> is in IEXT(I(rdf:range)) and  <u,v> is in IEXT(x) then v is in ICEXT(y)

>> If <x,y> is in IEXT(I(rdf:domain)) and  <u,v> is in IEXT(x) then u is in ICEXT(y)

 

<comment> This reflects our current understanding of multiple domain and range restrictions rather than the wording in the M&S. </comment>

 

(The model theory imposes no conditions on the interpretations of rdfs:seeAlso, rdfs:isDefinedBy, rdfs:comment or rdfs:label.)

 

8.  Reification

 

RDF uses rdfs to define two important extensions to the basic language: reification (which allows RDF to make assertions about RDF statements) and the introduction of classes of containers.  These impose extra conditions on interpretations, requiring their domains to include particular classes of structures.

 

We will say that an interpretation I reifies a set of nodes V if it is capable of describing all RDF triples that can be constructed from those nodes. (The RDF M&S does not provide any vocabulary for describing graphs.) When considering lexical forms of the language, it would be natural to think of V as a set of lexical labels which identify nodes, eg a subset of( <uriref> union <qLiteral> union <anonNode>) in N-triples. For lexicalisations such as XML-RDF which do not provide lexical identifiers corresponding to all anonymous nodes, V must be allowed to contain arbitrary numbers of ‘blank’ node tokens as well as URIs and literals.)

 

In order to be reified, the interpretation must contain in its universe entities corresponding to all the nodes (or names) in V and all  the triples that can be constructed from them. Given V, therefore, we say that I reifies V just when there is a mapping REIF which we will call a reification mapping, from  (V union V x V x V)  into IR which satisfies  the following conditions:

 

>> <x,y> is in IEXT(I(rdf:subject)) iff for some a, b, and c in V,  x= REIF(<a b c>) and y= REIF(a)

>> <x,y> is in IEXT(I(rdf:predicate)) iff for some a, b and c in V, x=REIF(<a b c>) and y= REIF(b)

>> <x,y> is in IEXT(I(rdf:object)) iff for some a, b and c in V, x=REIF(<a b c>) and y= REIF(c)

>>  x is in ICEXT(I(rdf:Statement)) iff for some a, b and c in V, x=REIF(<a b c>)

 

This is deliberately stated (in the spirit of metaphysical agnosticism appropriate to  model theory) so as to not assume anything about the real nature of the entities in the universe that play the role of reified statements, and follows the M&S (section 4.1) which distinguishes between RDF assertions and their ‘modelling’ in the semantic domain. However, it would be quite consistent to assume that nodes and triples were themselves actual semantic objects in the universe, and take REIF to be the identity mapping.  The conditions above could then be stated more directly:

 

>> (V union VxVxV ) is a subset of IR

>> <x,y> is in IEXT(I(rdf:subject)) iff for some a, b, and c in V,  x=<a b c>and y= a

>> <x,y> is in IEXT(I(rdf:predicate)) iff for some a, b and c in V, x=<a b c>and y= b

>> <x,y> is in IEXT(I(rdf:object)) iff for some a, b and c in V, x=<a b c>and y= c

>>  x is in ICEXT(I(rdf:Statement)) iff for some a, b and c in V, x=<a b c>

 

We will call this simple reification and take it as the normal interpretation of reification in the model theory. (Nothing would be substantially altered, however, by the use of an explicit reification mapping, so the other alternative is available if anyone wishes to insist that the syntactic and semantic domains should not overlap, and impose a conceptual distinction between RDF expressions and the things that model them.) 

 

Reification is fairly simple, in this account. However, notice that having x (or REIF(x) ) in the domain does not in itself provide any access to the meaning of x in the interpretation; in particular, it provides no way to assert x. This is illustrated by fact that none of the above equations mention I(x) when x is in IR. The language provides no way to *use* (as opposed to mention) a reified triple. 

 

Adding such an ability (sometimes called ‘downward reflection’ or ‘dequoting’) would considerably increase the expressive power of RDF; it would in effect provide the language with a form of truth predicate. On the other hand, such devices need to be treated with considerable care in order to avoid self-referential paradoxes.

 

9. Containers

 

RDFS provides a general class of ‘containers’ and three particular kinds of container: bags, sequences and ‘alternatives’. Alternatives pose special problems which require re-defining part of the basic RDF model theory, and will be described in the next section. We note that also that the ‘distributive reference’ mechanisms which use rdf:aboutEach, described in M&S section 3.3, are part of the XML-RDF serialization rather than the RDF graph syntax, so are not considered here. (It would be relatively easy to include them, however.)

 

Bags and sequences are very similar except that bags are considered to be unordered. The natural way to describe this mathematically is to think of bags as sequences with an equivalence relation defined on them, and identify a bag with an equivalence class of sequences. This amounts to saying that bag identity can be checked by allowing permutations, but otherwise a bag is much like a sequence: in particular, the members of a particular bag do in fact occur in a particular order in the bag, even though that order is not considered germane to questions of bag identity. RDFS provides essentially the same machinery for accessing the elements of bags and of sequences, for example. For all these reasons, we will take an abstract notion of sequence, called here a series, to be the ‘basic’ container type for RDF/Ss.  A series (over S) is defined to be a partial function from the integers to S. If s is a series then s(n) is the n-th element of s; if s(n) is undefined then we will write s(n) = ??. <comment> ( ‘??’ can be taken to be a value, but it must be a special value which is not in the universe of any interpretation.) </comment>

 

We will assume that there is a distinguished class of things called containers with subclasses called bags and seqs, and that containers are series on (IR union LV).  Again, this allows such apparent oddities as containers which contain themselves, or two containers each of which contains the other. The set of containers-series will be called IS.

 

There is a special class of container membership properties denoted by ‘rdf:_1’, ‘rdf:_2’, etc., each of which selects a member from a position in a container. In a familiar abuse of notation, we will write ‘rdf:_n’ to refer to these, where ‘n’ is a parameter ranging over the natural numbers/numerals.

 

>> ICEXT(I(rdfs:Container)) =             IS,  a subset of IR

>> ICEXT(I(rdf:Bag)) =                         a subset of IS disjoint from ICEXT(I(rdf:Seq))

>> ICEXT(I(rdf:Seq)) =                        a subset of IS disjoint from ICEXT(I(rdf:Bag))

>> ICEXT(I(rdfs:ContainerMembershipProperty)) =    CMIP, a subset of IP

>> I(rdf:_n)  is in CMIP

>> <x,y> is in IEXT(I(rdf:_n))      iff     x is in IS and x(n)=y

 

Although the model theory does not need to assume this, it seems natural to assume that the series corresponding to RDF sequences are finite, ie that there is some integer length(s) such that for any  n > length(s), s(n) = ??. What is less clear is whether they should be allowed to have ‘gaps’, ie whether s(n) = ?? for any n <= length(s). The M&S is not entirely clear on this question, so it is left open in the model theory.  (The issue surfaces when an RDF graph gives only partial information about a container, e.g. suppose a sequence is asserted to have a first element which is the string “one” and a third element which is the string “three”, but no other information is provided. Should it follow that the container in question has a second element, even though its value has not been specified?)

 

The model theory as written also provides no formal account of what makes a bag really different from a sequence, other than insisting that they are different. One way to add such an account would be to impose a restriction on the subset ICEXT(I(rdf:Bag)), so that it is not permitted to contain two series which are permutations of each other. This restriction would not alter the truth-values of any piece of RDF syntax, however, so it is omitted from the formal model theory.

 

On what may be a related point, the RDF M&S provides a number of examples of what might be called ‘intended uses’ for containers, particularly bags, which seem to involve extra assumptions about the meanings of expressions involving bags, such as the observation that a triple whose first element is a bag could be considered to be as assertion about the bag, or about the elements of the bag, depending on the property being applied. As stated here, the model theory only supports interpretations in which containers are ‘opaque’ objects, so that assertions involving containers are about those containers, rather than being understood distributively to be about the members of the container. Treating containers as ‘transparent’ requires re-working the basic model theory somewhat, as described in the next section; it has quite extensive consequences, requiring the notions of satisfiability and entailment to be re-considered.

 

10. Alt

<not yet written />

 

11.  RDFS entailment

<not yet written/>

 

 


Appendix 1: Summary of model theory

RDF:

An interpretation I (of vocab(I)) is defined by:

1.  A nonempty set IR of resources, called the domain or universe of I.

2. A subset IP  of IR corresponding to properties

3. A mapping IEXT from  IP  into the powerset of  IRx(IR union LV)

4. A mapping IS:  vocab(I) -> IR

Suppose A is a mapping from some set of anonNodes to the domain of I. anon(E) is the set of anonymous nodes in the graph E.

>> if E is a <qLiteral> then I(E) = XL(E)

>> if E is a <uriref>  then I(E) = IS(E)

>> if E is an asserted triple <s p o >

     then I(E) = true iff <I(s),I(o)> is in  IEXT(I(p)), otherwise I(E)= false.

>> if E is a ground RDF graph then I(E) = false just in case I(E') = false for some asserted triple E' in E, otherwise I(E) = true.

>> if E is an anonymous node than [I+A](E) = A(E)

>> if E is an RDF graph then I(E) = true if [I+B](E) = true for some B defined on anon(E), otherwise I(E)= false. 

 

RDFS:

5. A subset IC of IR, containing classes

6.  A mapping ICEXT  from IC to  the powerset of (IR union LV)

>> ICEXT(I(rdfs:Resource)) =  IR

>> ICEXT(I(rdf:Property)) = IP

>> ICEXT(I(rdfs:Class)) =  IC

>> ICEXT(I(rdfs:Literal))  = LV

>> <x,y> is in IEXT(I(rdf:type)) iff x is in ICEXT(y)

>> <x,y> is in IEXT(I(rdfs:subClassOf)) iff ICEXT(x) is a  proper subset of ICEXT(y)

>> <x,y> is in IEXT(rdfs:subPropertyOf)) iff IEXT(x) is a proper subset of IEXT(y)

>> I(rdfs:ConstraintResource) is in IC

>> ICEXT(I(rdfs:ConstraintProperty)) is the intersection of IP and                                                                  ICEXT(I(rdfs:ConstraintResource))

>> I(rdf:range) and I(rdf:domain)  are in ICEXT(I(rdfs:ConstraintProperty))

>> If <x,y> is in IEXT(I(rdf:range)) and  <u,v> is in IEXT(x) then v is in ICEXT(y)

>> If <x,y> is in IEXT(I(rdf:domain)) and  <u,v> is in IEXT(x) then u is in ICEXT(y)

 

REIFICATION (simple form):

For some vocabulary V, (V union VxVxV ) is a subset of IR

>> <x,y> is in IEXT(I(rdf:subject)) iff for some a, b, and c in V,  x=<a b c>and y= a

>> <x,y> is in IEXT(I(rdf:predicate)) iff for some a, b and c in V, x=<a b c>and y= b

>> <x,y> is in IEXT(I(rdf:object)) iff for some a, b and c in V, x=<a b c>and y= c

>>  x is in ICEXT(I(rdf:Statement)) iff for some a, b and c in V, x=<a b c>

 

CONTAINERS

IS is a set of series over (IR union LV). IS is a subset of IR.

>> ICEXT(I(rdfs:Container)) = IS

>> ICEXT(I(rdf:Bag)) =  a subset of IS disjoint from ICEXT(I(rdf:Seq))

>> ICEXT(I(rdf:Seq)) = a subset of IS disjoint from ICEXT(I(rdf:Bag))

>> ICEXT(I(rdfs:ContainerMembershipProperty)) =  CMIP, a subset of IP

>> I(rdf:_n)  is in CMIP

>> <x,y> is in IEXT(I(rdf:_n)) iff  x is in IS and x(n)=y


 

RDF/RDFS model theory summary

0. Domains and mappings of interpretation I

vocab(I) set of URIs ; LV set of literal values (global) ; IR set of resources (universe);

IP subset of IR (properties) ;   IC: subset of IR (classes).

XL: literals -> LV;  IS: vocab(I) -> IR;  IEXT: IP -> subsets of (IR x (IR union LV));

ICEXT: IC -> subsets of (IR union LV)

1. Basic equations

E is

I(E) is

A literal  (qLiteral)

XL(E)

A uri   (uriref)

IS(E)

An asserted triple <s p o>

true if <I(s), I(o)> is in IEXT(I(p)), otherwise false

Any other triple

not defined

A ground RDF graph

false if I(E’) =false for any asserted triple E’ in E,

otherwise true

An anonymous node (anonNode)

not defined;  but [I+A](E) = A(E)

An RDF graph

true if [I+A](E) = true for some A: anon(E) -> IR,

otherwise false.

2. Class extensions

E is

I(E) is in IC; ICEXT(I(E)) is

rdfs:Resource

IR     (The universe of the interpretation)

rdf:Property

IP     (Properties; subset of IR, domain of IEXT)

rdfs:Class

IC     (Classes; subset of IR, domain of ICEXT)

rdfs:Literal

LV    (Literal values)

rdfs:ConstraintResource

CR    (subset of IR)

rdfs:ConstraintProperty

CP = IP intersect CR

3. Property extensions

E is

I(E) is in IP; <x,y> is in IEXT(I(E))  iff

rdf:type

x is in ICEXT(y)

rdfs:subClassOf

ICEXT(x) is a proper subset of ICEXT(y)

rdfs:subPropertyOf

IEXT(x) is a proper subset of IEXT(y)

E is

I(E) is in CP; <x,y> is in IEXT(I(E)) iff

rdfs:range

if <u,v> is in IEXT(x) then  v is in ICEXT(y)

rdfs:domain

if <u,v> is in IEXT(x) then  u is in ICEXT(y)

4. Reification (Simple form)

V is a set of nodes (URIs and/or anonymous nodes and/or literals)

(V union (VxVxV) )  is a subset of IR

E is

I(E) is in IC;  x is in ICEXT(I(E)) iff

rdf:Statement

 x= <a b c>  where a, b, c are in V

E is

I(E) is in IP;  <x,y> is in IEXT(I(E))  iff

rdf:subject

x= <a b c> and y=a where a, b, c are in V

rdf:predicate

x= <a b c> and y=b where a, b, c are in V

rdf:object

x= <a b c> and y=c where a, b, c are in V

5. Containers

IS is a set of series over (IR union LV)

IS is a subset of IR

rdfs:Container

IS

rdf:Bag

a subset of IS disjoint from ICEXT(I(rdf:Seq))

rdf:Seq

a subset of IS disjoint from ICEXT(I(rdf:Bag))

E is

I(E) is in IC;  ICEXT(I(E)) is

rdfs:ContainerMembershipProperty

CMIP, a subset of IP

E is

I(E) is in CMIP; <x,y> is in IEXT(I(E)) iff

rdf:_n

x is in IS and x(n)=y

 

(The model theory imposes no conditions on the interpretations of rdfs:seeAlso, rdfs:isDefinedBy, rdfs:comment or rdfs:label.)

 

 


Appendix 2: proofs of lemmas.

<not yet written/>

 

(Herbrand Lemma) Any graph has a satisfying interpretation. 

Proof. If E is a set of triples, let S, P and O be the sets of urirefs and anonNodes occuring in E in respectively the subject, property and object position in a triple. (These sets need not be disjoint). Then define an interpretation I as follows:

IR= S union P union O

IP = P

IEXT(x) = { <y, z> | <y x z> is in E  and z is in IR} union {<y, LV(z)> | <y x z> is in E and z is in XL}

IS(x) = x on URIs in IR

A(x) = x on anonNodes in IR

Obviously I+A, and hence I, satisfies E.

QED.