Commenting on a snapshot, taken at about dusk UK time 11/Nov/2002, of C:\data\HP\dynamic\public\2002\20021030-reviewingRDFCoreDocs\mt\RDF_Semantics_latest.html
Critical issues which must be addressed at this publication are strong orange and are enumerated here. Suggested replacement text is in normal orange.
There are no critical issues at this time.
Issues which should be addressed before last call are in strong purple - with suggested replacement text in normal purple.
Stylistic issues are in strong green. These are suggestions to the editor, which the editor may accept or reject at his discretion without further discussion. Suggested replacement text in normal green.
Occasionaly I may write something nice in strong blue.
And sometimes pass comment in strong whatever this colour is.
Copyright ©2002 W3C® (MIT, INRIA, Keio), All Rights Reserved. W3C liability, trademark, document use and software licensing rules apply.
This is a specification of a precise semantics for RDF and RDFS, with some entailment results. It is intended to be readable by a general technical audience.
This work is part of the W3C Semantic Web Activity. It has been produced by the RDF Core Working Group which is chartered to address a list of issues raised since RDF 1.0 was issued.
This document is a W3C Working Draft for review by W3C members and other interested parties. It is a draft document and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use W3C Working Drafts as reference material or to cite them as other than "work in progress". A list of current public W3C Working Drafts can be found as part of the W3C Technical Reports and Publications.
0. Introduction
0.1 Model-theoretic semantics
0.2 Graph Syntax
0.3 Definitions
1. Interpretations
1.1 Technical notes
1.2 Urirefs, resources and literals
1.3 Interpretations
1.4 Denotations of ground graphs
1.5 Unlabeled nodes as existential
assertions
2. Simple Entailment between RDF graphs
2.1 Criteria for non-entailment
3. Interpreting the RDF(S) vocabulary
3.1 RDF interpretations
3.2 Reification and containers
3.2.1 Reification
3.2.2 RDF Containers
3.3 RDFS Interpretations
3.3.1 A note on rdfs:Literal
4. Vocabulary entailment and closure
rules
4.1. Rdf-entailment and rdf
closures(informative)
4.2. Rdfs-entailment and rdfs
closures(informative)
Appendix A. Summary of model theory
Appendix B. Lbase translation(informative)
Appendix C. Proofs of lemmas
Appendix D. Acknowledgements
References
Change Log
[[[Editor's Note: further work on the overlap with [RDF_CONCEPTS] is needed. Some of the following discussion may be replaced by content in other documents.]]]
RDF is intended to be used to convey meanings. It is hard to give a precise characterization of what this means, and the semantics given here restricts itself to a formal notion of meaning which could be characterized as the part that is common to all other accounts of meaning. Exactly what is considered to be the 'meaning' of an assertion in RDF in some broad sense may depend on many factors, including social conventions, comments in natural language or links to other content-bearing documents. Most of this meaning will be inaccessible to machine processing and is mentioned here only to emphasize that the formal semantics described here is not intended to provide an analysis of 'meaning' in this broad sense. However, users should operate under the basic assumption that any such meaning is preserved by any formal inference processes which preserve truth in the formal sense, so that any terms in a formally sanctioned conclusion from a set of RDF graphs can be interpreted as carrying the same informal meanings that they had in the original graphs. We note in passing that this condition raises many complex questions about how such informal meanings derived from several sources should be combined, and that these questions are also not addressed by the formal semantics. See [RDF_CONCEPTS] for further discussion. It has been argued that certain sources of RDF assertions should be taken as more authoritative or more reliable than others, in particular that assertions made by the 'owner' of a uriref should be considered to be definitive in determining the meaning of those urirefs. The semantics given here takes no position on issues like this.
We use a basic technique for specifying the formal meaning of a formal language called model-theoretic semantics. This assumes that the language refers to a 'world', and describes the minimal conditions that a world must satisfy in order to assign an appropriate meaning for every expression in the language. A particular world is called an interpretation, so that model theory might be better called 'interpretation theory'. The idea is to provide an abstract, mathematical account of the properties that any such interpretation must have, making as few assumptions as possible about its actual nature or intrinsic structure. Model theory tries to be metaphysically and ontologically neutral. It is typically couched in the language of set theory simply because that is the normal language of mathematics - for example, this semantics assumes that names denote things in a set IR called the 'universe' - but the use of set-theoretic language here is not supposed to imply that the things in the universe are set-theoretic in nature. Model theory is usually most relevant to implementation via the notion of entailment, described later, which makes it possible to define valid inference rules.The chief utility of a formal semantic theory is not to provide any deep analysis of the nature of the things being described by the language or to suggest any particular processing model, but rather to provide a technical way to determine when inference processes are valid, i.e. when they preserve truth.
In this document we give two versions of the same semantic theory: directly, and also (in an informative appendix) an 'axiomatic semantics' in the form of a translation from RDF and RDFS into another formal language, Lbase [LBASE] which has a predefined model-theoretic semantics. The translation technique offers some advantages for machine processing and may be found easier to read by some readers, so is described here as a convenience. We believe that both of these descriptions, and also the closure rules described in section 4, are all in exact correspondence, but only the directly described model theory should be taken as normative.
There are several aspects of meaning in RDF which are ignored by this semantics; in particular, it treats URI references as simple names, ignoring aspects of meaning encoded in particular URI forms [RFC 2396] and does not provide any analysis of time-varying data or of changes to URI references. It does not provide any analysis of indexical uses of URI references, for example to mean 'this document'. It does not assign any particular meaning to some parts of the RDF and RDFS namespaces, and in some cases, notably the reification and container vocabularies, it assigns less meaning than one might expect. These cases are noted in the text and the limitations discussed in more detail. The semantics treats RDF and RDFS as simple assertional languages, in which each triple expresses a simple proposition. This imposes a fairly strict monotonic discipline on the language, so that it cannot express closed world assumptions, local default preferences, and several other commonly-used non-monotonic constructs.
Particular uses of RDF, including as a basis for more expressive languages such as DAML and OWL, may impose further semantic conditions in addition to those described here, and such extra semantic conditions can also be imposed on the meanings of terms in particular RDF namespaces. We refer to these as semantic extensions. Semantic extensions to RDF are constrained in this recommendation using the language of RFC 2119. An example semantic extension is RDF Schema, the semantics of which are defined in later parts of this document. All such extensions MUST conform to the semantic conditions in this document. In more operational terms, any entailment which is valid according to the semantics described here MUST continue to be valid in any extended semantics imposed on an RDF namespace. Any name for entailment in a semantic extension MUST be indicated by the use of a namespace does it have to be a namespace? entailment term, as introduced in section 3 below.
Any semantic theory must be attached to a syntax. This semantics is defined as a mapping on the abstract syntax of RDF [RDF_CONCEPTS]. We use the following terminology defined there: uriref, defined as RDF URI Reference; literal, plain literal, typed literal, blank node and triple.
The convention that relates a set of triples to a picture of an RDF graph can be stated as follows. Draw one oval for each blank node and uriref, and one rectangle for each literal, which occur in either the S or O position in any triple in the set, and write each uriref or literal as the label of its shape. Then for each triple <S,P,O>, draw an arrowed line from the shape produced from S to the shape produced from O, and label it with P. Technically, this is a picture of a mathematical structure which can be described as a partially labeled directed pseudograph with unique node labels, but we will simply refer to a set of triples as an RDF graph.
In this document we will use the N-triples syntax
described in [RDFTestCases] to describe RDF
graphs. This notation uses a nodeID convention to indicate blank nodes in the
triples of a graph. Note that while
node
identifiers such as _:xxx
serve to identify blank nodes in
the surface syntax, these expressions are not considered to be the
label of the graph node they identify; they are not names, and do not occur
in the actual graph. In particular, two N-triples
documents which differ only by re-naming their node identifiers will be
understood to describe identical RDF graphs.
The N-triples syntax requires that urirefs be given in full, enclosed in angle brackets. In the interests of brevity, we use the imaginary URI scheme do you really mean to use a scheme here? Does it exist? is it not better to stick to the qname form? In the interests of brevity we use a shorter notation for urirefs based on XML QNames. Names of the form prefix:name should be understood to have the 'prefix:' replaced by a string of characters associated with the prefix. The following prefixes and character strings are used in this document: 'ex:' to provide illustrative examples. To obtain a more realistic view of the normal appearance of the N-triples syntax, the reader should imagine this replaced with something like 'http://example.org/rdf/mt/artificialExample/'. We will also make extensive use of the Qname prefixes rdf: and rdfs: defined as follows:
Prefix rdf: namespace URI: http://www.w3.org/1999/02/22-rdf-syntax-ns#
Prefix rdfs: namespace URI: http://www.w3.org/2000/01/rdf-schema#
Prefix ex: string http://example.org/
Since Qname syntax is not legal in the N-triples syntax, and in the interests of brevity and readability, we will use the convention whereby a Qname is used without surrounding angle brackets to indicate the corresponding uriref enclosed in angle brackets, eg the triple
<ex:a> rdf:type rdfs:Property .
should be read as an abbreviation for the N-triples syntax
<ex:a> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://www.w3.org/2000/01/rdf-schema#Property> .
Several definitions will be important in what follows.
A subgraph of an RDF graph is simply a subset of the triples in the graph. Each triple in a graph is considered to be a subgraph.
Consider a set S of graphs which share no blank nodes; call this a separated set of graphs. The graph consisting of all the triples in all the graphs in S is another graph, which we will call the merge of S. Each of the original graphs is a subgraph of the merged graph. Notice that when forming a merged graph, two occurrences of a given uriref or literal as nodes in two different graphs become a single node in the union graph (since by definition they are the same uriref or literal), but blank nodes are not 'merged' in this way; and arcs are of course never merged. That's not true is it? If two graphs each have an arc with the same uriref at the blunt end and the same uriref at the sharp end, there will only be one arc, not two, in the merged graph. If the set S is not separated then we will define the merge of S to be the merge of a set obtained by replacing blank nodes in some members of S by distinct blank nodes to obtain a separated set S' of graphs which are isomorphic to those in S.
Notice that one does not, in general, obtain the merge of a set of graphs by concatenating their corresponding N-triples documents and constructing the graph described by the merged document, since if some of the documents use the same node identifiers, the merged document will describe a graph in which some of the blank nodes have been 'accidentally' merged. To merge Ntriples documents it is necessary to check if the same nodeID is used in two or more documents, and to replace it with a distinct nodeID in each of them, before merging the documents. Similar cautions apply to merging graphs described by RDF/XML documents which contain nodeIDs.. s/.././
An RDF graph will be said to be ground if it has no blank nodes.
We will refer to a set of urirefs as a vocabulary. The vocabulary of a graph is the set of urirefs that it contains (either as nodes, on arcs or in typed literals). A name is a uriref or a typed literal. I'm surprised only typed literals here, but no doubt will learn why later. A name is from a vocabulary V if it is in V or is a typed literal containing a uriref in V. The names of a graph are all the names which occur in the graph. This is the set of expressions that need to be assigned a meaning by an interpretation. We do not think of plain literals as names because their interpretation is fixed by the RDF semantic rules. Are you telepathic? When giving examples, we will sometimes use a string of characters with no intervening colon to indicate 'some name'. I think you said that already here.
An instance of an RDF graph is, intuitively, a similar graph in which some blank nodes may have been replaced by urirefs or literals. However, it is technically convenient to also allow blank nodes to be replaced by other blank nodes, so we need to state this rather more precisely. Say that one triple is an instance of another if it can be obtained by substituting zero or more urirefs, literals or blank nodes for blank nodes in the original; and that a graph is an instance of another just when every triple in the first graph is an instance of a triple in the second graph, and every triple in the second graph has an instance in the first graph. Note that any graph is an instance of itself.
This allows blank nodes in the second graph to be replaced by names in the instance (which might cause some nodes to be identified that were previously distinct) but it also allows them to be replaced by other blank nodes. In particular, this means that the two graphs:
<ex:a> <ex:b> _:xxx .
<ex:a> <ex:b> _:yyy .
and
<ex:a> <ex:b> _:zzz .
with, respectively, three nodes and two arcs, and two nodes and one arc, are instances of each other. Similarly,
_:xxx <ex:b> _:xxx .
is an instance of
_:xxx <ex:b> _:yyy .
A proper instance of a graph is an instance in which at least one blank node has been replaced by a name. The above examples are not proper instances.
We do not impose any logical restrictions on the domains and ranges of properties; in particular, a property may be applied to itself. When classes are introduced in RDFS, we will allow them to contain themselves. This might seem to violate one of the axioms of standard (Zermelo-Fraenkel) set theory, the axiom of foundation, which forbids infinitely descending chains of membership. However, the semantic model given here distinguishes properties and classes considered as objects from their extensions - the sets of object-value pairs which satisfy the property, or things that are 'in' instance of or members of the class - thereby allowing the extension of a property or class to contain the property or class itself without violating the axiom of foundation. In particular, this use of a class extension mapping allows classes to contain themselves. For example, it is quite OK for (the extension of) a 'universal' class to contain the class itself as a member, a convention that is often adopted at the top of a classification hierarchy. (If an extension contained itself then the axiom would be violated, but that case never arises.) The technique is described more fully in Hayes&Menzel.
In this respect, RDFS differs from many conventional ontology frameworks such as UML which assume a more structured system of 'layers', or draw a distinction between data and meta-data. However, while RDFS does not assume the existence of such structure, it does not prohibit it. RDF allows such loops, but it does not mandate their use for all parts of a user vocabulary.If this aspect of RDFS is found worrying, then it is possible to restrict oneself to a subset of RDF graphs which do not contain any such 'loops' of class membership or property application, and still retain much of the expressive power of RDFS for many practical purposes.
The use of the explicit extension mapping also makes it possible for two properties to have exactly the same values, or two classes to contain the same instances, and still be considered distinct. This means that RDFS classes can be considered to be rather more than simple sets; they can be thought of as 'classifications' or 'concepts' which have a robust notion of identity which goes beyond a simple extensional correspondence. This property of the model theory has significant consequences in more expressive languages built on top of RDF, such as OWL, which are capable of expressing identity between properties and classes directly. This 'intensional' nature of classes and properties is sometimes claimed to be a useful property of a descriptive language, but a full discussion of this issue is beyond the scope of this document.
Notice that the question of whether or not a class contains itself as a member is quite different from the question of whether or not it is a subclass of itself. All classes are subclasses of themselves.
As report by Arjon is http://lists.w3.org/Archives/Public/www-rdf-comments/2002OctDec/0097.html this rule, nor the equivalent one for subproperties is reflected in the closure rules below.
RDF uses two kinds of referring expression, urirefs and literals. We make very simple and basic assumptions about these. Urirefs are treated as logical constants, i.e. as names which denote things (the things are called 'resources', following RFC 2396, but no assumptions are made here about the nature of resources.) The meaning of a literal is principally determined by its character string: it either refers to the value mapped from the string by the associated datatype, or if no datatype is provided then it refers to the Unicode string itself. unless there is a lang code, right? We do not take any position here on the way that urirefs may be composed from other expressions, Do you need to say this - the uri part of an RDF URI reference is always absolute - there are no relative URI's allowed in the abstract syntax. e.g. from relative URIs or Qnames; the semantics simply assumes that such lexical issues have been resolved in some way that is globally coherent, so that a single uriref can be taken to have the same meaning wherever it occurs. Similarly, the semantics has no special provision for tracking temporal changes. It assumes, implicitly, that urirefs have the same meaning whenever they occur. To provide an adequate semantics which would be sensitive to temporal changes is a research problem which is beyond the scope of this document.
We do not make any assumptions about the relationship between the denotation of a uriref and a document or network resource which can be obtained by using that uriref in an HTTP transfer protocol. It has been argued that urirefs in the form of HTTP URIs should be required to denote the document that results from such a retrieval. Such a requirement could be added as an extra semantic condition, but this condition is not assumed in this document.
The basic intuition of model-theoretic semantics is that asserting a sentence makes a claim about the world: it is another way of saying that the world is, in fact, so arranged as to be an interpretation which makes the sentence true. In other words, an assertion amounts to stating a constraint on the possible ways the world might be. Notice that there is no presumption here that any assertion contains enough information to specify a single unique interpretation. It is usually impossible to assert enough in any language to completely constrain the interpretations to a single possible world, so there is no such thing as 'the' unique RDF interpretation. In general, the larger an RDF graph is - the more it says about the world - then the smaller the set of interpretations that an assertion of the graph allows to be true - there are fewer ways the world could be, while making the asserted graph true of it.
The following definition of an interpretation is couched in mathematical language, but what it amounts to intuitively is that an interpretation provides just enough information about a possible way the world might be - a 'possible world' - in order to fix the truth-value (true or false) of any ground RDF triple. It does this by specifying for each uriref, what it is supposed to be a name of; and also, if it is used to indicate a property, what values that property has for each thing in the universe; and if it used to indicate a datatype, we assume that the datatype defines a mapping between lexical forms and datatype values. This is just enough information to fix the truth-value of any ground triple, and hence any ground RDF graph.(We will show how to determine the truthvalues of non-ground graphs in the following section.) Notice that if we left any of this information out, it would be possible for some well-formed triple to be left without a determinate value; and also that any other information - such as the exact nature of the things in the universe - would, regardless of its intrinsic interest, be irrelevant to the actual truth-values of any triple.
All interpretations will be relative to a set of urirefs, called the vocabulary of the interpretation; so that one should speak, strictly, of an interpretation of an RDF vocabulary, rather than of RDF itself. Some interpretations may assign special meanings to the symbols in a particular namespace, which we will call a reserved vocabulary. Interpretations which share the special meaning of a particular reserved vocabulary will be named for that vocabulary, so that we will speak of 'rdf-interpretations' , 'rdfs-interpretations', etc.. An interpretation with no reserved vocabulary will be called a simple interpretation, or simply an interpretation. A simple interpretation can be viewed as having an empty reserved vocabulary.
RDF uses several forms of literal. The chief semantic characteristic of literals is that their meaning is determined by the form of the string they contain. In the case of typed literals, however, the full specification of the meaning depends on being able to access the datatype information which is external to RDF itself; for this reason we postpone a full discussion of the meaning of typed literals until later sections, where we introduce a special notion of datatype interpretation. For now, we will assume that each interpretation defines a mapping IL from typed literals to their interpretations, and will impose stronger conditions on IL as the notion of 'interpretation' is extended in later sections. Simple what term did we decide to use for these? "plain literals"? literals, without embedded datatypes, are always interpreted as referring to themselves: either a character string or a pair consisting of two character strings, the second of which is a language tag.
The set of all possible values of all literals is assumed to be a set called LV which is common to all RDF interpretations. Since the set of datatypes is not restricted by RDF syntax, it is impossible to give a sharp definition of LV, but it is required to contain all literal strings and also all pairs consisting of a literal string and a language tag.
A simple interpretation I of a vocabulary V is defined by:
1. A non-empty set IR of resources, called the domain or universe of I, which is a superset of LV.
2. A distinguished subset IP of IR, the set of properties.
3. A mapping IEXT from IP into the powerset of IR x IR i.e. the set of sets of pairs <x,y> with x and y in IR .
4. A mapping IS from V into IR
5. A mapping IL from typed literals into IR.
IEXT(x) is a set of pairs which identify the arguments for which the property is true, i.e. a binary relational extension, called the extension of x. This trick of distinguishing a relation as an object from its relational extension allows a property to occur in its own extension, as noted earlier.
The assumption the grammar? s/the/that/ IR is a superset of LV amounts to saying that literal values are thought of as real entities that 'exist'. This assumption may seem controversial, since it amounts to saying that literal values are resources. We note however that this does not imply that literals should be identified with urirefs. There is a technical reason why the range of IL is IR rather than being restricted to LV. When we consider interpretations which take account of datatype information, it is syntactically possible for a typed literal to be internally inconsistent, and we will require such badly typed literals to denote a non-literal value.
In the next sections we give the exact rules for how an interpretation of a vocabulary determines the truth-values of any RDF graph, by a recursive definition of the denotation - the semantic "value" - of any RDF expression in terms of those of its immediate subexpressions. RDF has two kinds of denotation: names denote things in the universe, and sets of triples denote truthvalues.
The denotation of a ground RDF graph in I is given recursively by the following rules, which extend the interpretation mapping I from labels to ground graphs. These rules (and extensions of them given later) work by defining the denotation of any piece of RDF syntax E in terms of the denotations of the immediate syntactic constitutents of E, hence allowing the denotation of any piece of RDF to be determined by a kind of syntactic recursion.
if E is a plain literal then I(E) = E |
if E is a typed literal than I(E) = IL(E) |
if E is a uriref then I(E) = IS(E) |
if E is a triple s p o . then I(E) = true if <I(s),I(o)> is in IEXT(I(p)), otherwise I(E)= false. |
if E is a ground RDF graph then I(E) = false if I(E') = false for some triple E' in E, otherwise I(E) =true. |
Notice that if the vocabulary of an RDF graph contains urirefs that are not in the vocabulary of an interpretation I - that is, if I simply does not give a semantic value to some name that is used in the graph - then these truth-conditions will always yield the value false for some triple in the graph, and hence for the graph itself. Turned around, this means that any assertion of a graph implicitly asserts that all the names in the graph actually refer to something in the world. Note that the final condition implies that an empty graph (an empty set of triples) is trivially true.
As an illustrative example, the following is a small
interpretation for the artificial vocabulary {ex:a, ex:b, ex:c
}.
We use integers to indicate the 'things' in the universe. This is
not meant to imply that RDF interpretations should be interpreted as being
about arithmetic, but more to emphasize that the exact nature of the things
in the universe is irrelevant.(In this and subsequent examples we use the
greater-than and less-than symbols in several ways: following mathematical
usage to indicate abstract pairs and n-tuples; following Ntriples syntax to
enclose urirefs, and also as arrowheads when indicating mappings. We
apologize for any confusion.)
IR = LV union{1, 2};
IEXT: 1->{<1,2>,<2,1>}
IS: ex:a
->1, ex:b
->1,
ex:c
->2
IL: any typed literal -> 2
Figure 1: An example of an interpretation. Note, this is not a picture
of an RDF graph.
This interpretation makes these triples true:
<ex:a> <ex:b> <ex:c> .
<ex:c> <ex:a> <ex:a> .
<ex:c> <ex:b> <ex:a> .
<ex:a> <ex:b> "whatever"^^ex:b .
For example, I(<ex:a> <ex:b> <ex:c> .
) =
true if <I(ex:a
),I(ex:c
)> is in
IEXT(I(<ex:b>
)), i.e. if <1,2> is in IEXT(1), which
is {<1,2>,<2,1>} and so does contain <1,2> and so
I(<ex:a <ex:b> ex:c>
) is true.
The truth of the fourth literal is a consequence of the rather idiosyncratic interpretation chosen here for typed literals; this kind of oddity will be ruled out when we consider datatyped intepretations.
It makes these triples false:
<ex:a> <ex:c> <ex:b> .
<ex:a> <ex:b> <ex:b> .
<ex:c> <ex:a> <ex:c> .
<ex:a> <ex:b> "whatever" .
For example, I(<ex:a> <ex:c> <ex:b> .
) =
true if <I(ex:a
),I(<ex:b>
)>,
i.e.<1,1>, is in IEXT(I(ex:c
)); but I(ex:c
)=2
and IEXT is not defined on 2, so the condition fails and I(<ex:a>
<ex:c> <ex:b> .
) = false.
It makes all literals containing a plain literal false, since the property extension does not have any pairs containing a character string.
To emphasize; this is only one possible interpretation of this vocabulary; there are (infinitely) many others. For example, if we modified this interpretation by attaching the property extension to 2 instead of 1, none of the above six triples would be true.
Blank nodes are treated as simply indicating the existence of a thing, without using, or saying anything about, the name of that thing. (This is not the same as assuming that the blank node indicates an 'unknown' uriref; for example, it does not assume that there is any uriref which refers to the thing. See http://www.w3.org/2000/03/rdf-tracking/#rdfms-identity-anon-resources for a summary and pointers to further discussions on this issue. The discussion of skolemization in the proof appendix is also relevant.)
We now show how an interpretation can specify the truth-value of a graph containing blank nodes. This will require some definitions, as the theory so far provides no meaning for unlabeled nodes. Suppose I is an interpretation and A is a mapping from some set of unlabeled nodes to the universe IR of I, and define I+A to be an extended interpretation which is like I except that it uses A to give the interpretation of unlabeled nodes. Define anon(E) to be the set of unlabeled nodes in E. Then we can extend the above rules to include the two new cases that are introduced when unlabeled nodes occur in the graph:
If E is an unlabeled node then I+A(E) = A(E) |
If E is an RDF graph then I(E) = true if I+A'(E) = true for some mapping A' from anon(E) to IR, otherwise I(E)= false. |
Notice that we have not changed the definition of an interpretation; it still consists of the same values IR, IP, IEXT, IS and IL. We have simply extended the rules for defining denotations under an interpretation, so that the same interpretation that provides a truth-value for ground graphs also assigns truth-values to graphs with unlabeled nodes, even though it provides no denotation for the unlabeled nodes themselves. Notice also that the unlabeled nodes themselves are perfectly on the grounds of deep suspicion of all claims to perfection, d/perfectly/ well-defined entities; they differ from other nodes only in not being assigned a denotation by an interpretation, reflecting the intuition that they have no 'global' meaning (i.e. outside the graph in which they occur).
This effectively treats all blank nodes as having the same meaning as existentially quantified variables in the RDF graph in which they occur. Notice however that that since two unlabeled nodes cannot have the same label, there is no need to specify the scope of the quantifier within a graph, and no need to use any explicit quantifier syntax.( If we were to apply the semantics directly to N-triples syntax, we would need to indicate the quantifier scope, since in this lexicalization syntax the same node identifier may occur several times corresponding to a single blank node in the graph. The above rule amounts to the convention that would place the quantifiers just outside, or at the outer edge of, the N-triple document corresponding to the graph.)
For example, with this convention, the graph defined by the following triples is false in the interpretation shown in figure 1:
_:xxx <ex:a> <ex:b> .
<ex:c> <ex:b> _:xxx .
since if A' maps the unlabeled node to 1 then the first triple is false in I+A', and if it maps it to 2 then the second triple is false.
Note that each of these triples, if thought of as a single graph, would be true in I, but the whole graph is not; and that if a different nodeID were used in the two triples, indicating that the RDF graph had two blank nodes instead of one, then A' could map one node to 2 and the other to 1, and the resulting graph would be true under the interpretation I.
Following conventional terminology, we say that I satisfies E if I(E)=true, and that a set S of expressions (simply) entails E if every interpretation which satisfies every member of S also satisfies E. In later sections these notions will be adapted to classes of interpretations with particular reserved vocabularies, but throughout this section 'entailment' should be interpreted as meaning simple RDF entailment.
Entailment is the key idea which connects model-theoretic semantics to real-world applications. As noted earlier, making an assertion amounts to claiming that the my cpu is a bit overloaded at this point. Do you mean *the* world as in the real world, do you mean *a* world, or do you mean some other specific world but I haven't figured out which one it is. world is an interpretation which assigns the value true to the assertion. If A entails B, then any interpretation that makes A true also makes B true, so that an assertion of A already contains the same "meaning" as an assertion of B; we could say that the meaning of B is somehow contained in, or subsumed by, that of A. If A and B entail each other, then they both "mean" the same thing, in the sense that asserting either of them makes the same claim about the world. The interest of this observation arises most vividly when A and B are different expressions, since then the relation of entailment is exactly the appropriate semantic licence to justify an application inferring or generating one of them from the other. Through the notions of satisfaction, entailment and validity, formal semantics gives a rigorous definition to a notion of "meaning" that can be related directly to computable methods of determining whether or not meaning is preserved by some transformation on a representation of knowledge.
Any process or technique which constructs a graph E from some other graph(s) S is said to be (simply) valid if S entails E, otherwise invalid. Note that being an invalid process does not mean that the conclusion is false, and being valid does not guarantee truth. However, validity represents the best guarantee that any assertional language can offer: if given true inputs, it will never draw a false conclusion from them.
In this section we give a few basic results about simple entailment and valid inference. Simple entailment can be recognized by relatively simple syntactic comparisons. The two basic forms of simply valid proof step in RDF are, in logical terms, the inference from (P and Q) to P, and the inference from (foo baz) to (exists (?x) foo(?x)).
I can't parse that. I guess I don't know the background logical stuff well enough. {P, Q} entails P I understand. But I don't know how to read (foo baz). No doubt I'll get to it.
Note, these results apply only to simple entailment, not to the more subtle notions of entailment introduced in later sections. Proofs, all of which are straightforward, are given in appendix B, which also describes some other properties of entailment which may be of interest.
Subgraph Lemma. A graph entails all its subgraphs .
Instance Lemma. A graph is entailed by any of its instances.
The relationship between merging and entailment is simple, and obvious from the definitions:
Merging lemma. The merge of a set S of RDF graphs is entailed by S, and entails every member of S.
This means that a set of graphs can be treated as equivalent to its merge, i.e. a single graph, as far as the model theory is concerned. In what follows, therefore, we will often not bother to distinguish between a set of graphs and a single graph. This can be used to simplify the terminology somewhat: for example, we can paraphrase the definition of S entails E, above, by saying that S entails E when every interpretation which satisfies S also satisfies E.
The main result for simple RDF inference is:
The interpolation lemma completely characterizes simple RDF entailment in syntactic terms. To tell whether a set of RDF graphs entails another, find a subgraph of their merge and replace names by unlabeled nodes to get the second. Of course, there is no need to actually construct the merge. If working backwards from the consequent E (the graph that may be entailed by the others), the most efficient technique would be to treat unlabeled nodes as variables in a process of subgraph-matching, allowing them to bind to 'matching' names in the antecedent graph(s) in S, i.e. those which may entail the consequent graph. The interpolation lemma shows that this process is valid, and is also complete if the subgraph-matching algorithm is. The existence of complete subgraph-checking algorithms also shows that RDF entailment is decidable, i.e. there is a terminating algorithm which will determine for any finite set S and any graph E, whether or not S entails E.
Notice however that such a variable-binding process would only be appropriate when applied to the conclusion of a proposed entailment. This corresponds to using the document as a goal or a query, in contrast to asserting it, i.e. claiming it to be true. If an RDF document is asserted, then it would be invalid to bind new values to any of its unlabeled nodes, since the resulting graph would not be entailed by the assertion, as explained in the next section.
It might be thought that the operation of changing a bound variable would be an example of an inference which was valid but not covered by the interpolation lemma, e.g. the inference of
_:x <ex:a> <ex:b> .
from
_:y <ex:a> <ex:b> .
Recall however that by our conventions, these two expressions describe identical RDF graphs.
Finally, the following is a trival consequence of the definition of entailment, but it may be worth stating explicitly since many implemented systems fail to satisfy it:
Monotonicity Lemma. Suppose S is a subgraph or subset of S' and S entails E. Then S' entails E.
Notice that unlabeled nodes are not identified with other nodes in a merge, and indeed this reflects a basic principle of RDF graph inference: in contrast to names, which have a global identity which carries across all graphs, blank nodes should not be identified with other nodes or re-labeled with urirefs, in order to ensure that the resulting graph is entailed by what one starts with. To state this condition precisely, we need to first exclude a counterexample. It is possible for a graph to contain two triples one of which is an instance of the other, for example:
<ex:a> <ex:b> _:xxx .
<ex:a> <ex:b> <ex:c> .
Such an internally redundant graph is equivalent to one of its
own instances, since replacing the blank node by <ex:c>
would result in a single-triple graph which is a subgraph of the original. To
rule out such cases of internal redundancy, we
will say that an RDF graph is lean if none of its triples is a proper
instance of any other. Then the above principle is made precise in the
following two lemmas concerning criteria for non-entailment:
This means that there is no valid RDF inference process which can produce an RDF graph in which a single unlabeled node occurs in triples originating from several different graphs. (Of course, such a graph can be constructed, but it will not be entailed by the original documents. An assertion of such a graph would reflect the addition of new information about the identity of two unlabeled nodes.)
We emphasise again that these results apply only to simple entailment, not to the namespace entailment relationships defined in rest of the document.
So far we have considered only the model theory of what might be called the logical form of the RDF graph itself, without imposing any special interpretations on any reserved vocabulary. In the rest of the document we will extend the model theory to describe the semantic conditions reflecting the intended meanings of the rdf: and rdfs: namespaces. Some of these interpretations diverge in some ways from the meanings described in RDFMS and RDFSchema , and we will note these differences as they arise. Some stylistic care to ensure reader does not think this reference applies to the current schema spec. ... described in the RDF Model and Syntax specification [RDFMS] and the first RDF Schema candidate recommendation [RDF-SCHEMA-1], ...
Although we will do this in stages, the same general technique is used throughout. First we describe a reserved vocabulary, i.e. a set of urirefs which will be given a special meaning; then we give the extra conditions on an interpretation which capture those meanings; then we restrict the notions of satisfiability and entailment to apply to these interpretations only. This essentially imposes an a priori restriction on the world being described that it satisfy the extra conditions. The new semantic conditions are automatically assumed to be true; an interpretation which would violate them is simply not allowed to count as a possible world. We call these restricted notions of satisfiability and entailement vocabulary specific satisfiability and vocabulary specific entailment respectively.
Since there are now many distinct notions of interpretation, entailment and satisfiability, we use the conventional Qname namespace prefixes to identify the various distinctions, eg an rdf-interpretation is an interpretation satisfying the rdf semantic conditions, rdf-entailment means entailment relative to such interpretations, and so on. I'm not sure we need the namespace prefix idea hear at all. Since there are now many distinct notions of interpretation, entailment and satisfiability, we will name them, e.g. an rdf-interpretation is an interpretation satisfying the rdf semantic conditions, rdf-entailment means ...We call this general idea vocabulary entailment , i.e. entailment relative to a set of interpretations which satisfy extra semantic conditions on a reserved vocabulary. Vocabulary specific entailment is more powerful than simple entailment, in the sense that a given set of premises entails more consequences. In general, as the reserved vocabulary is increased and extra semantic conditions imposed, the class of satisfying interpretations is restricted, and hence the corresponding notion of entailment becomes more powerful. For example, if S simply entails E then it also rdf-entails E, since every rdf-interpretation is also a simple interpretation; but S may rdf-entail E even though it does not simply entail it. Intuitively, a conclusion may follow from some of the extra assumptions incorporated in the semantic conditions imposed on the reserved vocabulary.
Another way of expressing this is that any restriction on interpretations decreases the number of possible ways that an interpretation might be a counterexample to E's following from S. I can't parse this. I keep thinking about it to confirm I believe it, but it makes my head hurt. I understood, I think, the para above. If this para is meant to make this idea easier to understand, it doesn't work for me. This could be an editorial comment - please delete. Or if its thought to be worth keeping, I'd like some help to figure out what it means.
Simple entailment is the vocabulary entailment of the empty vocabulary. It is therefore the weakest form of RDF entailment, which holds for any reserved vocabulary; it is the entailment which depends only on the basic logical form of RDF graphs, without making any further assumptions about the meaning of any urirefs.
We will consider syntactic criteria for recognizing vocabulary entailment in the next section.
I think one of the things that makes me nervous about vocab entailment is the notion that folks can make new vocabs from old, e.g. PRISM reuses DC, or at least some of it. There is potentially a complicated structure of vocabularies. I'm not saying there is anything wrong I can put my finger on here, I'm just abit twitchy. Could be the sherry.
You've done it again. I've just written down a worry and the next section deals with it.
Consider the following (rather small) reserved vocabulary, which we will call rdfRV:
RDF reserved vocabulary |
rdf:type rdf:Property rdf:nil
rdf:List |
IP contains I(rdf:type ) |
x is in IP iff <x,
I(rdf:Property )> is in
IEXT(I(rdf:type )) |
<I(rdf:nil ),
I(rdf:List )> is in
IEXT(I(rdf:type )) |
This forces every rdf interpretation to contain a thing which can be
interpreted as the 'type' of properties. The second condition could be
regarded as defining IP to be the set of resources in the universe
of the interpretation which have the value I(rdf:Property
) of
the property I(rdf:type
). This way of construing subsets of the
universe will be central in interpretations of RDFS. The
third condition says that the empty list object is classified as being a
list: this is the only formal condition the RDF semantics places on the
collection vocabulary, described later.
For example, the following rdf-interpretation extends the simple interpretation in figure 1:
IR = {1, 2, T , P}; IP = {1, T}
IEXT: 1->{<1,2>,<2,1>}, T->{<1,P>,<T,P>}
IS: ex:a
-> 1, <ex:b>
->1,
ex:c
-> 2, rdf:type
->T,
rdf:Property
->P, rdf:nil
->1, rdf:List
->P
Figure 2: An example of an rdf-interpretation.
This is not the smallest rdf-interpretation which extends the earlier
example, since we could have made I(rdf:Property
) be 2 and
IEXT(T) be {<1,2>,<T,2>}, and managed without having P in the
universe. In general, a given entity in an interpretation may play several
'roles' at the same time, as long as this can be done without violating any
of the required semantic conditions. The above interpretation identifies
properties with lists, for example; of course, other interpretations might
not make such an identification.
It is important to note that every rdf-interpretation is also a simple interpretation.The 'extra' structure does not prevent it acting in the simpler role.
RDF provides vocabularies which are intended for use in describing
containers and bounded collections, and a reification vocabulary to enable an
RDF graph to describe, as well as exhibit, triples. Although these
vocabularies have reasonably clear informally intended conventional meanings,
we do not impose any further formal semantic
conditions on them, so the notions of rdf-entailment and rdf-interpretation
apply to them without further change. They are discussed here in order to
explain both the intuitive meanings intended, and also to note the intuitive
consequences which are not supported by the formal model theory.
Constraints are imposed on the meanings of these vocabularies in semantic
extensions. When the RDFS vocabulary is added, this part of the RDF
vocabulary will have nontrivial consequences; but until then, it can be added
to the RDF vocabulary without needing any further semantic conditions.We will refer to the complete set of all rdf urirefs,
consisting of the RDF reserved vocabulary, all of the reification, container
and collection vocabularies and the uriref rdf:value
, as the
RDF vocabulary, rdfV.
The lack of a formal semantics for these vocabularies does not reflect any technical semantic problems, but rather is a design decision to make it easier to implement RDF reasoning engines which can check formal RDF entailment. Since no extra formal semantic conditions are imposed on them, they are not considered to be restricted vocabularies in RDF. In RDFS, however, the entire RDF vocabulary is considered to be a restricted vocabulary. I don't think we have met the term "restricted vocabulary" yet. Confirmed.
The RDF reification vocabulary consists of a class name and three property names.
RDF reification vocabulary |
rdf:Statement rdf:subject rdf:predicate rdf:object |
Semantic extensions MAY limit the interpretation of these so that a triple of the form
aaa rdf:type rdf:Statement .
is true in I just when I(aaa) is a token of an RDF triple in some RDF document, and the three properties, when applied to such a denoted triple, have the same values as the respective components of that triple.
This may be illustrated by considering the following two RDF graphs, the first of which consists of a single triple.
<ex:a> <ex:b> <ex:c> .
and
_:xxx rdf:type rdf:Statement .
_:xxx rdf:subject <ex:a> .
_:xxx rdf:predicate <ex:b> .
_:xxx rdf:object <ex:c> .
The second graph is called a reification of the triple in the first
graph, and the node which is intended to refer to the first triple - the
blank node in the second graph - is called, rather confusingly, a reified
triple not in m&s. its a
reified statement. . (This can be a blank node or a
uriref.) In the intended interpretation of the
reification vocabulary, the second graph would be made true in an
interpretation I by interpreting the reified triple to refer to a token of
the triple in the first graph in some concrete RDF document, considering that
token to be valid RDF syntax, and then using I to interpret the syntactic
triple which the token instantiates, so that the subject, predicate
and object of that triple are interpreted in the same way in the reification
as in the reified triple. Formally, <x,y> is in
IEXT(I(rdf:subject
)) just when x is a token of an RDF triple of
the form
aaa bbb ccc .
and y is I(aaa); similarly for predicate and object. Notice that the value
of the rdf:subject
property is not the subject uriref itself but
its interpretation, and so this condition involves a two-stage interpretation
process: we have to interpret the reified node - the subject of the triples
in the reification - to refer to another triple, then treat that triple as
RDF syntax and apply the interpretation mapping again to get to the referent
of its subject. This requires triple tokens to exist as first-class entities
in the universe IR of an interpretation. In sum: the
meaning of the reification is that a document exists containing a triple
token which means whatever the first graph means.
We emphasize that the semantic extension described here requires the
reified triple that the reification describes - I(_:xxx
) in the
above example - to be a particular token or instance of a
triple in a (real or notional) RDF document, rather than an 'abstract' triple
considered as a grammatical form. There could be
several such entities which have the same subject, predicate and object
properties. Although a graph is defined as a set of triples, several such
tokens with the same triple structure might occur in different documents.
Thus, it would be meaningful to claim that the blank node in the second graph
above does not refer to the triple in the first graph, but to some other
triple with the same structure. This particular interpretation of reification
was chosen on the basis of use cases where properties such as dates of
composition or provenance information have been applied to the reified
triple, which are meaningful only when thought of as referring to a
particular instance or token of a triple.
Although RDF applications may use reification to refer to triple tokens in RDF documents, the connection between the document and its reification must be maintained by some means external to RDF. RDF syntax provides no means to 'connect' an RDF triple to its reification. Since an assertion of a reification of a triple does not implicitly assert the triple itself, this means that there are no entailment relationships which hold between a triple and a reification of it. Thus the reification vocabulary has no effective semantic constraints on it, other than those that apply to an RDF interpretation. The chief facts that are worthy of note about RDF reification, in fact, are examples of non-entailments.
A reification of a triple does not entail the triple, and is not entailed by it. (The reason for first is clear, since the reification only asserts that the triple token exists, not that it is true. The second non-entailment is a consequence of the fact that asserting a triple does not automatically assert that any triple tokens exist in the universe being described by the triple. For example, the triple might be part of an ontology describing animals, which could be satisfied by an interpretation in which the universe contained only animals.)
Since the relation between triples and reifications of triples in any RDF graph or graphs need not be one-to-one, asserting a property about some entity described by a reification need not entail that the same property holds of another such entity, even if it has the same components. For example,
_:xxx rdf:type rdf:Statement .
_:xxx rdf:subject <ex:subject> .
_:xxx rdf:predicate <ex:predicate> .
_:xxx rdf:object <ex:object> .
_:yyy rdf:type rdf:Statement .
_:yyy rdf:subject <ex:subject> .
_:yyy rdf:predicate <ex:predicate> .
_:yyy rdf:object <ex:object> .
_:xxx <ex:property> <ex:foo> .
does not entail
_:yyy <ex:property> <ex:foo> .
The non entailment stuff looks good. Is it possible to simplify/eliminate some of the earlier bits and still keep the latter?
RDF provides vocabularies for describing three classes of containers. A container is an entity whose 'members' are thought of as the values of properties, each of which relates a particular 'position' in the container to the entity, if there is one, which is 'at' that position. (The rdfs vocabulary, described below, adds a generic membership property which holds regardless of position, and classes containing all the containers and all the membership properties.)
RDF Container Vocabulary |
rdf:Seq rdf:Bag rdf:Alt rdf:_1 rdf:_2 ... |
One should understand this RDF vocabulary as describing containers, rather than as a vocabulary for constructing them, as would typically be supplied by a programming language. On this view, the actual containers are entities in the semantic universe, and RDF graphs which use the vocabulary simply provide very basic information about these entities, enabling an RDF graph to characterize the container type and give partial information about the members of a container. Since the RDF container vocabulary is so limited, many 'natural' assumptions concerning RDF containers are not formally sanctioned by the RDF model theory. This should not be taken as meaning that these assumptions are false, but only that RDF does not formally entail that they must be true.
There are no special semantic conditions on the container vocabulary: the
only 'structure' which RDF presumes its containers to have is what can be
inferred from the use of this vocabulary and the semantic conditions on the
rest of the RDF vocabulary. Since the membership properties rdf:_1,
rdf:_2
, ... are implicitly ordered by their very names, that order can
be thought of as an ordering of the elements of the container. This implicit
ordering of members of a container applies to all three kinds of container,
even though bags are normally thought of as unordered. RDF does not support
any entailments which could arise from re-ordering the elements of an
rdf:Bag. For example,
_:xxx rdf:type rdf:Bag .
_:xxx rdf:_1 <ex:a> .
_:xxx rdf:_2 <ex:b> .
does not entail
_:xxx rdf:_1 <ex:b> .
_:xxx rdf:_2 <ex:a> .
Notice that if this conclusion were valid, then the result of conjoining it to the original graph would also be a valid entailment, which would assert that both elements were in both positions. (This is a consequence of the fact that RDF is a purely assertional language.)
There is no assumption that a property of a container applies to any of the elements of the container, or that if a property applies to a container then the property applies to any of the members of the container, or vice versa. There is no requirement that the three container classes are disjoint, so that for example something can be asserted to be both an rdf:Bag and an rdf:Seq. There is no assumption that containers are gap-free, so that for example
_:xxx rdf:type rdf:Seq.
_:xxx rdf:_1 <ex:a> .
_:xxx rdf:_3 <ex:c> .
does not entail
_:xxx rdf:_2 _:yyy .
That jars with M&S which says they are contiguous. My, potentially flawed understanding, is that we decided that containers are contiguous as per m&s, we just allow partial descriptions of them. Would the above entailment cause problems? i.e.
_:xxx rdf:type rdf:Seq .
_:xxx rdf:_nnn _:yyy .
entails
_:xxx rdf:_mmm _:zzz .
when nnn > 1 and mmm = nnn-1
There is no way in RDF to 'close' a container, i.e. to assert that it contains only a fixed number of members. This is a reflection of the fact that it is always consistent to add a triple to a graph asserting a membership property of any container. And finally, there is no built-in assumption that an RDF container has only finitely many members.
The informal purpose of the three container types is to allow applications
to encode the various intentions or expectations about different kinds of
containers. Sequences are thought of as totally ordered, bags as unordered
(that is, equivalent under re-orderings) and rdf:Alt containers are intended
to convey a series of alternative values of a property, which an application
can choose from. However, these informal interpretations are not reflected in
any RDF entailments. In particular, a triple with a rdf:Alt
as a
subject or object should not be thought of as an encoding of a logical
disjunction.
RDF provides a vocabulary for describing collections, ie.'list structures' in terms of head-tail links. Collections differ from containers in allowing branching structure and in having an explicit terminator, allowing applications to determine the exact set of items in the collection.
RDF Collection Vocabulary |
rdf:List rdf:first rdf:rest rdf:nil |
As with containers, no special semantic conditions are imposed on this vocabulary other than the type of nil being List. It is intended for use typically in a context where a 'well-formed' container is described using blank nodes to connect a sequence of items, each described by three triples of the form
_:c1 rdf:type rdf:List .
_:c1 rdf:first aaa .
_:c1 rdf:rest _:c2
where the final item is indicated by the use of rdf:nil
as
the value of the property rdf:rest
. In a familiar convention,
rdf:nil
can be thought of as the empty collection. Clearly,
any such graph amounts to an assertion that the collection, and all its
sub-collections, exist, and since the members of the collection can be
determined by inspection, this is often sufficient to enable applications to
determine what is meant. Note however that the semantics does not require any
collections to exist other than those mentioned explicitly in a graph (and
the empty collection). For example, the existence of a collection containing
two items does not automatically guarantee that the similar collection with
the items permuted also exists:
_:c1 rdf:type rdf:List .
_:c1 rdf:first <ex:aaa> .
_:c1 rdf:rest _:c2
_:c2 rdf:type rdf:List .
_:c2 rdf:first <ex:bbb> .
_:c2 rdf:rest rdf:nil .
does not entail
_:c3 rdf:type rdf:List .
_:c3 rdf:first <ex:bbb> .
_:c3 rdf:rest _:c4
_:c4 rdf:type rdf:List .
_:c4 rdf:first <ex:aaa> .
_:c4 rdf:rest rdf:nil .
Also, RDF imposes no 'wellformedness' conditions on the use of this vocabulary, so that it is possible to write RDF graphs which assert the existence of highly peculiar objects such as lists with forked or non-list tails, or multiple heads:
_:666 rdf:type rdf:List .
_:666 rdf:first <ex:aaa> .
_:666 rdf:first <ex:bbb> .
_:666 rdf:rest <ex:ccc> .
_:666 rdf:rest _:777 .
_:777 rdf:type rdf:List .
_:666 rdf:rest rdf:nil .
As this example shows, it is also possible to write a set of triples which
underspecify a collection by failing to specify its rdf:rest
property value.
Semantic extensions MAY place extra syntactic well-formedness restrictions
on the use of this vocabulary in order to rule out such graphs, and MAY
exclude interpretations of the collection vocabulary which violate the
convention that the subject of a 'linked' collection of three-triple items of
the form described above, ending with an item ending with
rdf:nil
, denotes a totally ordered sequence whose members are
the denotations of the rdf:first
values of the items, in the
order got by tracing the rdf:rest
properties from the subject to
rdf:nil
. This permits sequences which
contain other sequences.
RDFSchema extends RDF to include a larger reserved vocabulary rdfsV with more complex semantic constraints:
RDFS reserved vocabulary |
rdfs:domain rdfs:range rdfs:Resource rdfs:Literal
rdfs:XMLLiteral rdfs:Datatype rdfs:Class rdfs:subClassOf
rdfs:subPropertyOf rdfs:member rdfs:Container
rdfs:ContainerMembershipProperty rdfs:comment rdfs:seeAlso ,
rdfs:isDefinedBy rdfs:label |
(rdfs:comment, rdfs:seeAlso
,
rdfs:isDefinedBy
and rdfs:label
are included here
because some constraints which apply to their use can be stated using
rdfs:domain, rdfs:range
and rdfs:subPropertyOf
.
Other than this, the formal semantics does not assign them any particular
meanings.)
Although not strictly necessary, it is convenient to state the RDFS
semantics in terms of a new semantic construct, a 'class', i.e. a resource
which represents a set of things in the universe which all have that class as
the value of their rdf:type
property. Classes are defined to be
things of type rdfs:Class
. We will assume that there is a
mapping ICEXT (for the Class Extension in I) from classes to their
extensions; the first semantic condition in the table below amounts to the
following definition of this mapping in terms of the relational extension of
rdf:type
:
ICEXT(x) = {y | <y,x> is in IEXT(I(rdf:type
)) }
Notice that a class may have an empty class extension, and that (as noted
earlier) two different class entities could have the same class extension;
and that given the above definition, the class extension of
rdfs:Class
contains the class rdfs:Class
.
An
rdfs-interpretation of V is an rdf-interpretation I of (V union rdfV union rdfsV) which satisfies the following
semantic conditions and all the triples in the subsequent table, which we
will call axiomatic triples. The first condition can be understood
as a definition of ICEXT and hence of IC, the set of classes. Since I is an
rdf-interpretation, this means that IP =
ICEXT(I(rdf:Property
))
x is in ICEXT(y) iff <x,y> is in
IEXT(I( IC = ICEXT(I( IR = ICEXT(I( |
If <x,y> is in IEXT(I( |
If <x,y> is in IEXT(I( |
<x,y> is in
IEXT(I( |
<x,y> is in IEXT(I( |
If x is in ICEXT(I(rdfs:ContainerMembershipProperty ))
then <x,I(rdfs:member )> is in
IEXT(I(rdfs:subPropertyOf )) |
IC contains: I( |
IP contains: I( |
might want some marker here, even though you have said it above. ... contains the following axiomatic triples:
rdfs:subPropertyOf rdfs:domain rdf:Property . rdfs:domain rdf:Property
.rdfs:range rdfs:domain
rdf:Property .rdf:subject rdfs:domain
rdf:Statement .rdf:predicate rdfs:domain
rdf:Statement .rdf:object rdfs:domain
rdf:Statement .rdf:first rdfs:domain rdf:List
.rdfs:rest rdfs:domain rdf:List
.
rdf:_1 rdfs:subPropertyOf rdf:member
. . rdf:_2 rdfs:subPropertyOf rdf:member
. . |
The truth of the axiomatic triples could be stated as conditions on IEXT
and ICEXT, but it is convenient to use the truth-of-triples formulation.
Similarly, the conditions on IC and IP in the first table could be stated as
axiomatic triples with property rdf:type
and objects
rdfs:Class
and rdfs:Property
respectively.
The semantics given here for rdfs:range
and
rdfs:domain
do not entail that superclasses of domains or
ranges of a property must also be domains and ranges of that property. For
some purposes it may be more convenient to phrase the semantic conditions as
'iff' conditions, which would require imposing this as an extra condition on
the meanings of rdfs:range
and rdfs:domain
. This
extra condition will not effect any class-membership entailments on the
elements of the domains and ranges of the property. The semantics given here
was chosen because it is sufficient for all normal uses of these terms,
allows some subtleties in class reasoning, and places a lesser burden on
implementors. Some domain and range assertions are
omitted from the above table; in those cases, the domain or range of the
property may be taken to be rdfs:Resource
, i.e. the universe;
such range and domain assertions are essentially vacuous.
We will not attempt to give a pictorial diagram of an rdfs-interpretation.
The semantic conditions on rdfs-interpretations do not
include the condition that ICEXT(I(rdfs:Literal
)) must be a
subset of LV. While this would seem to be required for
conformance with RDFMS, there is no way to impose this
condition by any RDF assertion or syntactic closure rule. This
limitation is due to the fact that RDF does not allow literals to occur in
the subject position of a triple, so there are severe restrictions on what
can be said about literals in RDF. Similarly, while properties may be
asserted of the the class rdfs:Literal
, none of these can be
rdfs-validly transferred to literals
themselves.
For example, a triple of the form
<ex:a> rdf:type rdfs:Literal .
is consistent even though 'ex:a
' is a uriref rather than a
literal. What it says is that I(ex:a
) is a literal value, ie
that the uriref 'ex:a
' denotes a literal value. There is
however no way in current RDF to specify exactly which literal value it
denotes.
Note that the interpolation lemma guarantees that any triple containing a literal object entails a similar triple with a bnode as object:
<ex:a> <ex:b> "10"
.
entails
<ex:a> <ex:b> _:xxx .
This means that literal denotes 'something', which could therefore also be named, at least in principle, by a uriref.
A datatype is identified by a uriref and defines a set of character strings called lexical forms and a mapping from that set to a set of values. Exactly how these are defined is a matter external to RDF, but this is the minimal structure required in order to state a semantics. In operational terms, a reasoning engine would require that the uriref of a datatype provides access to a process which can determine, for any character string, whether or not it is a valid lexical form for that datatype, and for any two such valid character strings, whether or not they map to the same value under the lexical-to-value mapping. It may also use information about the identity of datatype values from different datatypes, if that information is available.
Since the set of possible datatypes is open-ended, we will assume that datatype interpretations are defined relative to a particular set of datatypes, and refer to D-interpretations where D is some set of datatypes, which we will call recognized datatypes. A 'datatype-aware' RDF engine SHOULD be competent to recognize at least the rdfs:XMLLiteral datatype and the set of all the XML Schema primitive datatypes. We will call this set XSD. and use the Qname prefix xsd: to refer to XML Schema datatypes in examples.
This is equisitely expressed, but does not seem to belong in the semantics doc. Move to concepts?
We will assume that there is a global mapping L2V from datatypes to their lexical-to-value mappings; the valid lexical forms of a datatype d constitute the domain of L2V(d), and the value spaces of d Care here: after your recent discussion with Henry, are we sure that the xsd notion of a value space is exactly the set of values of a datatype? Oh, and how many value spaces does a datatype have? s/spaces/space/. ... and the range of L2V(d) is the set of values in the value space of d. is the range of L2V(d). Recall that the set LV is defined to include all members of all datatype value spaces, so that L2V(d) must be a subset of LV.
A D-interpretation of a graph G is an rdfs-interpretation I of V, where V contains the vocabulary of G, which satisfies the following extra conditions:
ICEXT(I(rdfs:Datatype )) = D |
For any typed literal "sss"^^ddd in G, if I(ddd) is in D and 'sss' is a valid lexical form for I(ddd) then IL("sss"^^ddd) = L2V(I(ddd))(sss) |
For any typed literal "sss"^^ddd in G, if I(ddd) is in D and 'sss' is not a valid lexical form for I(ddd) then IL("sss"^^ddd) is not in LV |
If x is in D, then ICEXT(x) is the value space of L2V(x) ditto |
The first condition says that membership in the class
rdfs:Datatype
means the same as being a recognized datatype.
Thus, the inclusion of a triple of the form
<ex:ddd> rdf:type rdfs:Datatype .
in an RDF graph can be understood by a datatype-aware RDF
reasoner as a claim that ex:ddd
identifies a recognized
datatype. Such reasoners MAY post a warning or an error condition when they
are unable to access the relevant datatype information. While normal RDFS
reasoning is valid when applied to the datatype vocabulary, other
implications which depend on the properties of the datatype spaces may be
missed, and datatype clashes or other error conditions may be
undetectable.
The second condition says that the meaning of any typed literal which uses a 'recognized' datatype is the value of the literal character string under that datatype. For example, if I is an XSD-interpretation then I("15"^^xsd:integer) must be the number fifteen. Notice that this applies only to datatypes in D; typed literals whose type is not a recognized datatype are treated as before, i.e. as denoting some unknown thing. This means that their meanings can be further restricted by adding a suitable extra datatype to the set of recognized datatypes.
The third condition requires that an 'ill-formed' typed literal, i.e. one
where the literal string is not in the lexical space of the datatype, i/does/ not denote any literal
value. Intuitively, such a name does not denote any value, but in order to
avoid the semantic complexities which arise from empty names, we require such
a typed literal to denote an 'arbitrary' value. Thus for example, if D
contains the XML schema datatypes, then all that can be concluded about
I("arthur"^^xsd:integer) is that it is not in
ICEXT(I(rdfs:Literal
)). Datatype-aware RDF reasoners SHOULD post a warning or an error condition when such
literals are detected.
This comment is technical and beyond my competence. We have LV which is a subset of IR. "arthur"^^xsd:integer must denote something which is not in LV, you say here, I think. Are we sure there is anything in IR not in LV? Nowhere so far have we required IR to be a strict superset of LV. Or is that what you have just done with this requirement. I'm waiting to get the next para with baited breath, to see if the telepathy is still working.
The final condition indicates that RDF uses a datatype uriref in two ways: as a name for the datatype itself, and (when used as a class name) to indicate the class containing the elements of the value space ditto of the datatype.
Users should take care to distinguish the value space of a datatype from the class of its members. There you go again. telepathy in action. Different issue though. For example, the XML schema spec [XMLS] allows primitive datatypes whose elements are considered unequal as elements of the value space but which are identical when viewed as class members.
I would really like to stay away from this issue. You no doubt understood Henry better than I, but I get the impression there is still debate inside schema wg on this and the situation may change. Could it be that xsd datatypes don't really denote say integers, but pairs (primitive type, value). Or could it be that when we are talking about equality, xsd are using a notion of equals that takes four arguments xsd.equals(type1, val1, type2, val2). I dunno. I'm just thinking it would be good if we could write our stuff to stay completely away from that issue and as you say below, its up to the schema folks to explain, with our help of course, what exactly their model is.
RDF does not impose any identity conditions on elements in value spaces,
nor assume any subclass relationships between datatype value classes.
Information about such relationships should be obtained from the
specifications of the datatypes themselves. Similarly, RDF does not assume
that its literal strings are identical to elements of the class
xsd:string
, even though both are defined as sequences of unicode
characters. Users may wish to make such identifications, but are cautioned
that other users may disagree with any such claims or assumptions.
The treatment of unknown types provides a trivial proof of the following lemma:
Datatype monotonicity lemma. If D is a subset of D' and S D-entails E, then S D'-entails E.
It is possible for an RDF graph to have no D-interpretation which satisfies it. For example, since XML Schema requires that the value spaces of xsd:string and xsd:integer to be disjoint, so it is impossible to construct a XSD-interpretation satisfying the graph
<ex:a> <ex:b> "25"^^xsd:integer .
<ex:b> rdfs:range xsd:string .
This situation could be characterized by saying that the graph is
XSD-inconsistent, or as a datatype clash. Note that it is possible
to construct a satisfying rdfs-interpretation for this graph, but any such
interpretation, when extended to an XSD-interpretation, would violate the XSD
conditions, since the class extensions of I(xsd:string
) and
I(xsd:string
) would have a nonempty intersection. Although the definition of entailment means that a
D-inconsistent graph D-entails any RDF graph, datatype-aware RDF reasoners
SHOULD NOT publish conclusions derived from a recognized datatype-clash
contradiction.
This semantics for datatypes is minimal. It makes no provision for assigning a datatype to the range of a property,
Que?
prop rdfs:range xsd:integer .
We can say that right?
for example, and does not provide any way of explicitly asserting that a blank node denotes a particular value under a datatype mapping. There are several technical difficulties in extending this to a broader class of datatype usages while also preserving the simple 'core' of RDF. We expect that the datatyping machinery will be extended in later versions of RDF.
We will say that S rdf-entails E (S rdfs-entails E, S D-entails E) when every rdf-interpretation (every rdfs-interpretation, every interpretation datatyped with respect to D) which satisfies every member of S also satisfies E. This follows the wording of the definition of simple entailment in section 2, but refers only to rdf- , rdfs- or D-interpretations instead of all simple interpretations. These are examples of vocabulary entailment, i.e. entailment relative to a set of interpretations which satisfy extra semantic conditions on a reserved vocabulary.
It is easy to see that the lemmas in section 2 do not hold for vocabulary entailment. For example, the triple
rdf:type rdf:type rdf:Property .
is true in every rdf-interpretation, and hence rdf-entailed by the empty set, which immediately contradicts the interpolation lemma for rdf-entailment.
Rather than develop a separate theory of the syntactic conditions for recognising entailment for each reserved vocabulary, we will use a general technique for reducing these broader notions of entailment to simple entailment, by defining the closure of an RDF graph relative to a set of semantic conditions. The basic idea is to rewrite the semantic conditions as a set of syntactic inference rules, and define the closure to be the result of applying those rules to exhaustion. The resulting graphs will contain RDF triples which explicitly state all the special meanings embodied in the extra semantic conditions, in effect axiomatizing them in RDF itself. A graph rdf-entails (rdfs-entails) another just when its rdf-closure (rdfs-closure) simply entails it. It is not possible to provide such a tight result for D-entailment closures since the relevant semantic conditions require identities which cannot be stated in RDF.
The notion of closure used here is purely a formal device to relate two notions of entailment. We do not mean to suggest that closure rules should be used as a computational technique, or that actually generating the full closure would be the best process to use in order to determine vocabulary entailment. Implementors who wish to check any kind of entailment should use a process which is optimised for the combinatorics of the particular set of use cases that are most likely to arise in a given application area. In many cases it may be more efficient to use a process of backchaining on the closure rules, for example.
Closure rules correspond directly to implication axioms in the Lbase translation given in the appendix.
1. Add the following triple (which is true in any rdf-interpretation):
rdf:nil rdf:type rdf:List .
2. Apply the following rule recursively to generate all legal RDF triples (i.e. until none of the rules apply or the graph is unchanged.) Here xxx and yyy stand for any uriref, bNode or literal, aaa for any uriref.
if E contains | then add | |
rdf1 | xxx aaa yyy . | aaa rdf:type rdf:Property . |
Notice that this immediately generates the triple
rdf:type rdf:type rdf:Property .
which expresses a central semantic property of rdf interpretations.
The following lemma is the basic result on rdf-entailment, and illustrates a general pattern of how to characterize vocabulary entailment syntactically.
The result is rather obvious, but a complete proof is given in appendix B to illustrate the proof method.
RDFS closures require more complex rules to reflect the consequences of the more elaborate semantic constraints on the rdfs reserved vocabulary.
1. Add the RDFS axiomatic triples from the table in section 3.3 and all the following triples. There are many other triples which are true in every rdfs-interpretation, but they will be generated from these by the closure rules.
rdf:type rdfs:range rdfs:Class .
rdfs:Resource rdf:type rdfs:Class .
rdfs:Literal rdf:type rdfs:Class .
rdf:Statement rdf:type rdfs:Class .
rdf:nil rdf:type rdf:List .
rdf:subject rdf:type rdf:Property .
rdf:predicate rdf:type rdf:Property .
rdf:object rdf:type rdf:Property .
rdf:first rdf:type rdf:Property .
rdf:rest rdf:type rdf:Property .
2. Add all triples of the following forms. This is an infinite set because the RDF container vocabulary is infinite. However, since none of these triples entail any of the others, it is only necessary, in practice, to add the triples which use those container properties which actually occur in any particular graph or set of graphs in order to check the rdfs-entailment relation between those graphs.
rdf:_1 rdf:type rdfs:ContainerMembershipProperty .
rdf:_2 rdf:type rdfs:ContainerMembershipProperty .
...
3. Apply the following rules recursively to generate all legal RDF triples (i.e. until none of the rules apply or the graph is unchanged.) Here, xxx, yyy and zzz stand for any uriref, bNode or literal, aaa for any uriref, and uuu for any uriref or bNode (but not a literal).
If E contains: | then add: | |
---|---|---|
rdf1 | xxx aaa yyy . |
aaa rdf:type rdf:Property
. |
rdfs2 | xxx aaa yyy . |
xxx rdf:type zzz . |
rdfs3 | xxx aaa uuu . |
uuu rdf:type zzz . |
rdfs4a | xxx aaa yyy . | xxx rdf:type rdfs:Resource
. |
rdfs4b | xxx aaa uuu . | uuu rdf:type rdfs:Resource
. |
rdfs5 | aaa |
aaa rdfs:subPropertyOf ccc . |
rdfs6 | xxx aaa yyy . |
xxx bbb yyy . |
rdfs7 | xxx |
xxx rdfs:subClassOf
rdfs:Resource . |
rdfs8 | xxx |
xxx rdfs:subClassOf zzz . |
rdfs9 | xxx |
aaa rdf:type yyy . |
rdfs 10 | xxx rdf:type rdfs:ContainerMembershipProperty . |
xxx rdfs:subPropertyOf rdfs:member . |
rdfs 11 | xxx rdf:type rdfs:Datatype . |
xxx rdfs:subClassOf rdfs:Literal . |
The outputs of these rules will often trigger others. For example, these rules will generate the complete transitive closures of all subclass and subproperty heirarchies, together with all of the resulting type information about everything which can be inferred to be a member of any of the classes, and will propagate all assertions in the graph up the subproperty heirarchy, re-asserting them for all super-properties. rdfs1 will generate type assertions for all the property names used in the graph, and rdfs3 together with the first triple in the above list will add all the types for all the class names used. Any subproperty or subclass assertion will generate appropriate type assertions for its subject and object via rdfs 2 and 3 and the domain and range assertions in the RDFS axiomatic triple set. The rules will generate all assertions of the form
xxx rdf:type rdfs:Resource .
for every xxx in V, and of the form
xxx rdfs:subClassOf rdfs:Resource .
for every class name; and several more 'universal' facts, such as
rdf:Property
rdf:type
rdfs:Class
.
However, it is easy to see that (with the restriction noted of the infinite sets to those membership properties which occur in the graph) the rules will indeed terminate on any finite RDF graph, since there are only finitely many triples that can be formed from a given finite vocabulary.
A similar result applies here as in the case of rdf-entailment, though it takes considerably more work to prove:
We note in passing that the stronger 'iff' semantic conditions on
rdfs:domain
and rdfs:range
mentioned in section
3.3 would be captured by adding the additional rules
rdfs 2a | xxx rdfs:domain yyy . |
xxx rdfs:domain zzz |
rdfs 3a | xxx rdfs:range yyy . |
xxx rdfs:range zzz |
and that these would provide a redundant inference path to the conclusions of rdfs 2 and 3.
In order to capture datatype entailment in terms of assertions and closure rules, the rules need to refer to information about identity supplied by the datatypes themselves; and to state the rules it is necessary to assume syntactic conditions which can only be checked by consulting the datatype sources. Since such questions are beyond the scope of RDF, it is impossible to prove an entailment lemma for datatype closures analogous to those for RDF and RDFS.
Datatype information can be characterized abstractly as a (demunerably infinite) set of assertions about Unicode strings which specify, for each legal lexical form of each datatype, the fact that it is indeed legal, and which other lexical forms map to the same value under each datatype. Often the content of these infinite sets can be captured by familiar recursive rules on the lexical form of the equations and inequations, such as leading zero suppression for numerals, but for our purposes we can characterize the
The following rules are valid in every D-interpretation, provided that ddd indicates a datatype in D, sss and ttt are character strings which are both valid lexical forms for that datatype, and for the second rule that sss and ttt are mapped into the same value by the datatype.
In the first rule, _:xxx is a new bnode, i.e. one that does not appear elsewhere in the graph.
rdfD 1 | ddd |
aaa ppp _:xxx . |
rdfD 2 | ddd |
aaa ppp "ttt"^^ddd . |
These rules come as close as one can get in RDF to asserting that the typed literal denotes a literal value and that literals which map into the same values are equal. Notice that it would be invalid to make these inferences without checking that the literal string sss is in the lexical space of the datatype, so these cannot be considered valid rdfs-entailments.
These rules do not support any entailments based on identity between values of different datatypes. An obvious generalization of the second rule would permit such conclusions, but questions of identity between items in value spaces of two different datatypes should be referred to the authorities who defined the datatypes. These rules do however suffice to expose a datatype clash, by a chain of reasoning of the following form:
ppp rdfs:range ddd .
aaa ppp "sss"^^eee .
aaa ppp _:xxx .
(by rule rdfD 1)
_:xxx rdf:type eee .
_:xxx rdf:type ddd .
(by rule rdfs 3)
RDF/RDFS model theory summary |
|
---|---|
0. Domains and mappings of interpretation I |
|
vocab(I): set of urirefs ; LV: (global) set of literals ; IR: set of resources (universe); IP: subset of IR (properties) ; IC: subset of IR (classes). |
|
IS: vocab(I) -> IR IEXT: IP -> subsets of (IR x (IR union LV)) ICEXT: IC -> subsets of IR |
|
1. Basic equations |
|
E is: |
I(E) is: |
a literal |
E |
a uriref |
IS(E) |
a triple <s p o> |
true if <I(s), I(o)> is in IEXT(I(p)), otherwise false |
a ground RDF graph |
false if I(E') =false for any asserted triple E' in E, otherwise true |
an unlabeled node (blank node) |
not defined ; but I+A(E) = A(E) |
an RDF graph |
true if I+A'(E) = true for some A': anon(E) -> IR, otherwise false. |
2. Class extensions |
|
E is: |
I(E) is in IC; ICEXT(I(E)) includes: |
|
IR (The universe of the interpretation) |
|
IP (Properties; subset of IR, domain of IEXT) |
|
IC (Classes; subset of IR, domain of ICEXT) |
|
LV (Literals) |
3. Property extensions |
|
E is: |
I(E) is in IP; <x,y> is in IEXT(I(E)) iff: |
|
x is in ICEXT(y) |
|
ICEXT(x) is a subset of ICEXT(y) |
|
IEXT(x) is a subset of IEXT(y) |
E is: |
I(E) is in IP; if <x,y> is in IEXT(I(E)) then: |
|
if <u,v> is in IEXT(x) then u is in ICEXT(y) |
|
if <u,v> is in IEXT(x) then v is in ICEXT(y) |
4. Domain and Range | |
IEXT(I(rdfs:domain )) contains: |
<I( <I( <I( <I( <I( <I( |
IEXT(I(rdfs:range )) contains: |
<I( <I( <I( |
As noted in the introduction, an alternative way to specify RDF interpretations is to give a translation from RDF into a logical langauge with a model theory already attached, as it were. This 'axiomatic semantics' approach has been suggested and used previously (@@refs) with various alternative versions of the target logical language. Here we use a version of first-order logic which was designed to provide a semantic reference for such translations from web-based languages, called Lbase [LBASE].
To translate an RDF graph into the semantic reference language Lbase, apply the following rules to each expression noted. Each rule gives a translation TR[E] for the expression E, to be applied recursively. To achieve a translation which reflects a namespace entailment, add the axioms specified. Each namespace includes all axioms and rules for preceding namespaces, so that the RDFS translation of a graph should include the RDF translation as well as the RDFS axioms, and so on.
This translation uses the Lbase logical expressions
Lbase:String
and
Lbase:XMLthing
, which are true respectively of unicode
character strings, and anything that is denoted by a piece of well-formed XML
syntax; and it introduces some ad-hoc terminology in order to give a logical
account of the meanings implicit in the various literal constructions; for
example, the RDFS-D axioms use a predicate 'badLiteral' to flag cases of
typed literals which are illegally formed according to their attached
datatype.. The axioms given are sufficient to define the intended meanings of
the nonlogical vocabulary used.
The RDF-D axioms use two logical expressions whose meanings are not given
axiomatically but are intended to be described by the datatypes in D.
LegalLexicalForm(?y,?x)
is true when ?y is a character string
which is a legal lexical for for the datatype ?x, and L2V(?y,?x)
is a term denoting the value of the legal lexical form ?y under the
L2V mapping of the datatype ?x. Such meanings can usually be defined by a
countable set of Lbase axioms all conforming to some
regular pattern; for example, the datatype xsd:integer can be defined by
infinitely many axioms of the form
LegalLexicalForm('345', xsd:integer) and L2V('345',
xsd:integer)=345
We give the axioms for the built-in RDF datatype rdfs:XMLLiteral, which are trivial since they refer to the built-in Lbase category Lbase:XMLthing. The use of the datatype as a function name is defined by the axioms given: in effect, it means the same as the use of L2V when the argument is a legal lexical form, but has a different meaning if not.
RDF expression E | Lbase expression TR[E] |
a plain literal "sss" | 'sss', with all occurrences of the symbols ',/,(,),<,> prefixed with / |
a plain literal "sss"@tag | the term pair( TR["sss"],
'tag') |
a typed literal "sss"^^ddd | the term
TR[ddd]( TR["sss"]) |
rdfs:Resource | T |
any other uriref aaa | aaa |
a blank node | a variable (one distinct variable per blank node) |
a triple aaa bbb ccc . | TR[bbb]( TR[aaa],
TR[ccc]) |
an RDF graph | The existential closure of the conjunction of the translations of all the triples in the graph. |
|
|
|
Note, we did simply identify the use of the datatype function with the assertion of the datatype value, in order to avoid contradictions when describing a datatype error. This illustrates a general technique for handling error conditions when translating into Lbase.
Subgraph Lemma. A graph entails all its subgraphs.
Proof. Obvious, from definitions of subgraph and entailment. If the graph is true in I then for some A, all its triples are true in I+A, so every subset of triples is true in I. QED
Instance Lemma. A graph is entailed by all its instances.
Proof. Suppose I satisfies E' and E' is an instance of E. Then for some mapping A on the blank nodes of E', I+A satisfies every triple in E'. For each blank node b in E, define B(b)=I+A(c), where c is the blank node or name that is substituted for b in E', or c=b if nothing was substituted for it. Then I+B(E)=I+A(E')=true, so I satisfies E. But I was arbitrary; so E' entails E. QED.
If an instance of a graph E' is a subgraph of another graph E then E entails E'; this follows from the subgraph and instance lemmas. As we show below, this is in fact a necessary as well as sufficient condition for entailment, so it is useful to give a name to the syntactic condition that captures non-entailment. Say that a graph E' is separable from a graph E if no instance of E' is a subgraph of E. In particular, a ground graph is separable from E just when it is not a subgraph of E, and a ground triple is separable just in case it isn't in the graph. Graphs which are not separable from E are entailed by E; but for all others, there is a way to arrange the world so that they are false and E true.
For ground graphs, the subgraph lemma can be strengthened to provide simple necessary and sufficient conditions for entailment.
Conjunction Lemma.If E is ground, then I satisfies E if and only if it satisfies every triple in E.
Proof. Obvious, from definition of denotation for ground graphs. QED
Plain Subgraph Lemma. If E and E' are ground, then E entails E' if and only if E' is a subgraph of E.
Proof. 'If' follows directly from subgraph lemma; 'only if' follows from previous lemma and definition of entailment. QED
Herbrand Lemma. Any RDF graph has a satisfying interpretation.
Proof. We will construct the interpretation from the graph, by providing 'just enough' entities and extensions to make the graph true. Since the exact nature of the things in the universe is irrelevant, it is convenient to use the nodes of the graph themselves as their own denotations. (That was Herbrand's idea.)
The universe of I is defined to be the set of names and blank nodes in the graph.
Define IS to be the identity mapping on the vocabulary of the graph, and IEXT as follows: <x,y> is in IEXT(z) just when there is a triple in the graph of the form x z y . Define the mapping A to be the identity mapping on blank nodes of the graph.
Clearly I satisfies all ground triples in the graph, and I+A satisfies the entire graph; so I satisfies the graph. QED
An interpretation constructed in this way, so that the IS mapping is the identity mapping, is called a Herbrand interpretation. A Herbrand interpretation treats urirefs in the same way as literals, i.e. as denoting their own syntactic form. Of course this may not be what was intended by the writer of the RDF, but the lemma shows that any graph can be interpreted in this way.
If I satisfies E, then I may contain more information than is necessary to specify the truth of E; an interpretation - a world - can be larger than strictly needed to establish the truthvalues of a particular set of triples. It is therefore useful to define a notion of the minimal part of an interpretation which is just enough to make a given graph true.
Say that I' is a subinterpretation of I when vocab(I') is a subset of vocab(I), IR'is a subset of IR, I'(x)=I(x) wherever I'(x) is defined, and IEXT'(x) is a subset of IEXT(x) wherever IEXT'(x) is defined. Intuitively, I' defines a 'part' of the world defined by I. If a subinterpretation of I satisfies E, then I must also satisfy E.
Herbrand interpretations are minimal, and every minimal interpretation has a corresponding Herbrand interpretation which assigns the same truthvalues to every triple, and hence to every graph.
It is clear that if I satisfies E, then a minimal satisfying subinterpretation of I exists with a vocabulary precisely the vocabulary of E. The minimal interpretations can be characterized by the following lemma.
Minimality lemma. If I is a minimal satisfying interpretation of E, then I fails to satisfy every triple which has no instance in E.
Proof. We will argue by reductio. Suppose I satisfies some such triple S P O, i.e.. IEXT(I(P)) contains <I(S),I(O)>, and consider the subinterpretation I' which is like I except that IEXT(I'(P)) does not contain that pair. Since S P O has no instances in E, I'+A(x)=I+A(x) for any mapping A from blank nodes and any triple x in E, and I satisfies E, so I' satisfies E; so I is not minimal. QED
The property extensions in a minimal interpretation are 'shrink-wrapped' onto the assertions in the graph. Notice that every thing in the universe of a minimal interpretation of E must be the denotation of at least one node in E, and that every pair in any property extension must have at least one corresponding triple in E that it makes true; for if not, one could delete some of the interpretation and still satisfy E. We will make use of this property in later proofs.
Strong Herbrand Lemma. Any RDF graph E has a satisfying interpretation which does not satisfy any graph which is separable from E.
Proof.The construction in the proof of the Herbrand Lemma in fact establishes this result for arbitrary separable graphs. Consider the Herbrand interpretation I constructed in the proof of the Herbrand lemma, and let <S P O> be a triple which has no instances in E. Then either S is a name and there are no triples of the form S P O' in E, or O is a name and there are no triples of the form S' P O in E. Consider the first case (the other case is similar); then by the construction in the earlier proof, IEXT(I(P)) contains no pairs of the form <I(S), x>; so there is no mapping A from blank nodes to IR that could make the triple true in I+A; so the triple is false in I. Similarly for the other case. QED
Merging lemma. The merge of a set S of RDF graphs is entailed by S, and entails every member of S.
Proof. Obvious, from definitions of entailment and merge. All members of S are true iff all triples in the merge of S are true. QED.
Anonymity lemma 1. Suppose E is a lean graph and E' is a proper instance of E. Then E does not entail E'.
Proof. Since E' is a proper instance and E is lean, E' contains a triple S' P' O' obtained by replacing a blank node by a name in the triple S P O from E, while S' P' O' has no instances in E; for if not, the triple S P O would have had a proper instance in E. By the strong Herbrand lemma, there exists an interpretation which satisfies E but not E'. So E does not entail E'. QED
Proof. First we assume that the blank nodes occur in two distinct triples in E. Suppose that E contains the triples
S1 P1 _:x1 .
S2 P2 _:x2 .
where E' contains the triples
S1 P1 _:x .
S2 P2 _:x .
(The arguments for the cases where the blank nodes occur in other positions in the triples are similar.) Since E is lean, it contains no other triples of the form S1 P1 O' or S2 P2 O'. Let I be a Herbrand interpretation of E; then I(S1) is distinct from I(S2) and IEXT(I(P1)) ={<I(S1), _:x1>}and IEXT(I(P2))={<I(S2), _:x2>}. Let A be any mapping from the blank nodes of E' to IR, then in order for both triples to be true in I+A, I+A(_:x) would have to equal both _:x1 and _:x2; but these are distinct; so I does not satisfy E'.
The only remaining case is where E contains a single triple with two blank nodes which are identified in E':
_:x1 P _:x2 .
where E' contains
_:x P _:x .
The argument here is similar; the Herbrand interpretation I now has IEXT(I(P)) = {<_:x1,_:x2>} and there is no mapping from the second triple that could satisfy this, so again I satisfies E but not E'. QED.
Note that the 'minimal' nature of the Herbrand construction provides an interpretation that is sufficient to make a graph true, but only just sufficient. This is a basic technique for showing that one graph does not entail another and for establishing a precise correspondence between syntactic relationships and entailment.
Interpolation Lemma. S entails E if and only if a subgraph of the merge of S is an instance of E.
Proof. 'if' is a direct consequence of the merging and instance lemmas.
To prove 'only if' we will show the converse. This is just a re-statement of the strong Herbrand lemma. Assume that no subgraph of the merge of S is an instance of E, i.e. that all subgraphs of the merge of S fail to be instances of E; i.e., that E is separable from the merge of S. Then by the strong Herbrand lemma the merge of S does not entail E. So, by the merging lemma, S does not entail E. QED.
Skolemization is a syntactic transformation routinely used in automatic inference systems in which existential variables are replaced by 'new' functions - function names not used elsewhere - applied to any enclosing universal variables. While not itself strictly a valid operation, skolemization adds no new content to an expression, in the sense that a skolemized expression has the same entailments as the original expression provided they do not contain the new skolem functions.
In RDF, skolemization simplifies to the special case where an existential variable is replaced by a 'new' name, i.e. a uriref which is guaranteed to not occur anywhere else.(Using a literal would not do. Literals are never 'new' in the required sense, since their meaning is fixed.) To be precise, a skolemization of E (with respect to V) is a ground instance of E with respect to a vocabulary V which is disjoint from the vocabulary of E.
The following lemma shows that skolemization has the same properties in RDF as it has in conventional logics. Intuitively, this lemma shows that asserting a skolemization expresses a similar content to asserting the original graph, in many respects. In effect, it simply gives 'arbitrary' names to the anonymous entities whose existence was asserted by the use of blank nodes. However, care is needed, since these 'arbitrary' names have the same status as any other urirefs once published. Also, skolemization would not be an appropriate operation when applied to anything other than the antecendent of an entailment. A skolemization of a query would represent a completely different query.
Proof. sk(E) entails E by the interpolation lemma.
Now, suppose that sk(E) entails F where F shares no vocabulary with V; and suppose I is some interpretation satisfying E. Then for some mapping A from the blank nodes of E, I+A satisfies E. Define an interpretation I' of the vocabulary of sk(E) by: IR'=IR, IEXT'=IEXT, I'(x)=I(x) for x in the vocabulary of E, and I'(x)=I+A(y) for x in V, where y is the blank node in E that is replaced by x in sk(E).Clearly I' satisfies sk(E), so I' satisfies F. But I'(F)=I+A(F) since the vocabulary of F is disjoint from that of V; so I satisfies F. So E entails F. QED.
RDF closure lemma. Any satisfying rdf-interpretation of E satisfies the rdf-closure of E; and any minimal simple satisfying interpretation of the rdf-closure of E is a satisfying rdf-interpretation of E.
Proof. This follows from a comparison of the rdf closure rules with the semantic conditions on an rdf-interpretation. Although the argument is very simple in this case, we give it here in full to illustrate the general technique.
The first part follows from the fact that the closure rules are all rdf-valid. To show this, suppose I is an rdf-interpretation; then for any aaa in the vocabulary of I, if a triple of the form xxx aaa yyy is true in I, then IEXT(I(aaa)) is nonempty then I(aaa) is in IP, so IEXT(I(rdf:type)) contains <I(aaa),I(rdf:Property)>, so the triple aaa rdf:type rdf:Property is true in I. Since I is an rdf-interpretation, its vocabulary contains rdf:type and IP contains I(rdf:type), so in particular the triple
rdf:type rdf:type rdf:Property .
is true in I. That establishes that the closure rules are rdf-valid.
To prove the other part of the lemma we must show that the closure rules are together sufficient to force any minimal interpretation to be an rdf-interpretation of E. The simplest way to argue this is to show the converse, viz. that any minimal simple interpretation of the rdf-closure that violates one of the semantic conditions for an rdf-interpretation of E would thereby fail to satisfy the closure. Suppose therefore that I is a minimal simple interpretation of the rdf-closure of E.
If I violates the first constraint then IP does not contain I(rdf:type); in that case, the added triple in the first closure rule is false in I. So assume that I violates the second constraint. Then there is some x in IP for which IEXT(I(rdf:type)) does not contain <x,I(rdf:Property)>. Since I is minimal, there is some node aaa in E with I(aaa)=x; and since I(aaa) is in IP, there is a pair <y,z> in IEXT(I(aaa)) and a triple
bbb aaa ccc .
in E with I(bbb)=y and I(ccc)=z. Then the closure of E contains the triple
aaa
rdf:type rdf:Property .
which is false in I. So I fails to satisfy the rdf-closure. QED.
Notice the need for the minimality assumption, which 'forces' the semantic violation to be made explicit in the syntax of the graph itself. The second part of the lemma could be false for an arbitrary simple interpretation of the closure, which might fail to meet the required semantic conditions on some part of the universe that was not referred to in the graph itself. In general, one cannot infer, from the lack of an assertion in a graph, that what that assertion would say if it were in the graph must be false in a satisfying interpretation of the graph. Minimal interpretations, however, embody a 'closed world assumption' which would sanction such an inference. To prove an entailment we need to prove something about all interpretations; but to prove the converse, it is enough to show that a single interpretation exists with the right properties, and this is where the special properties of minimal interpretations are useful.
RDF entailment lemma. S rdf-entails E if and only if the rdf-closure of the merge of S simply entails E.
Proof. Follows from the merging lemma, the RDF closure lemma and the definition of entailment. By the merging lemma, we can identify S with the merge of S, i.e. we can treat a set of graphs as a single graph MS.
So suppose that MS rdf-entails E, and let I be a simple interpretation of the rdf-closure c(MS) of MS. Then there is a minimal simple subinterpretation I' of I which satisfies c(MS); so, by the previous lemma, I' is a satisfying rdf-interpretation of E. Therefore I satisfies E, since I' is a subinterpretation of I.
Conversely, suppose that c(MS) simply entails E, and let I be an rdf-interpretation of MS; then by the previous lemma, I satisfies c(MS), so I satisfies E (since every rdf-interpretation is a simple interpretation). QED.
RDFS Closure Lemma. Any satisfying rdfs-interpretation of E satisfies the rdfs-closure of E; and any minimal simple satisfying interpretation of the rdf-closure of E is a satisfying rdfs-interpretation of E.
Proof.(Sketch) As in the proof of the RDF closure lemma, this follows from a point-by-point comparison of the rdfs closure rules with the semantic conditions on an rdfs-interpretation. A full proof would be long but tedious.We will illustrate the form of the argument by considering some typical cases in detail.
The first part follows from the fact that the rdfs closure rules are all rdfs-valid, which can be checked case by case. For example, consider the closure rule rdfs5, and suppose I is an rdfs-interpretation which satisfies
aaa
rdfs:subPropertyOf
bbb .bbb
rdfs:subPropertyOf
ccc .Then by the semantic conditions on an rdfs-interpretation, IEXT(I(aaa)) is a subset of IEXT(I(bbb)) and IEXT(I(bbb)) is a subset of IEXT(I(ccc)); so IEXT(I(aaa)) is a subset of IEXT(I(ccc)); so I satisfies
aaa
rdfs:subPropertyOf
ccc .The other cases are similarly straightforward.
To demonstrate the other part of the lemma we must show that the closure rules are together sufficient to restrict any minimal interpretation to be an rdfs-interpretation. The simplest way to argue this is to show the converse, by demonstrating that any minimal simple interpretation of the rdfs-closure c(E) of E that violates one of the semantic conditions for an rdfs-interpretation of E would thereby make some triple in c(E) false. Again, this can be checked by a detailed examination of the cases that arise.
Suppose that I is a minimal simple interpretation of E. If I violates any of the conditions involving the rdfs-vocabulary then it is easy to check that one of the 'added' triples would be false, eg if IEXT(I(rdfs:domain) does not contain <I(rdfs:domain),I(rdf:Property)> then the triple
rdfs:domain rdfs:domain rdf:Property .
is false in I.
If I violates the condition on IEXT(I(
rdfs:range
)), then there exist x, y, u and v in IR with <x,y> in IEXT(I(rdfs:range
)), <u,v> in IEXT(x) but v not in ICEXT(y). Since I is a minimal interpretation and satisfies c(E), the closure must contain two triplesaaa
rdfs:range
bbb .ccc aaa ddd .
where I(aaa)=x, I(bbb)=y, I(ccc)=u and I(ddd)=v; but I makes the triple
ddd
rdf:type
bbb .false, since I(ddd) is not in ICEXT(I(bbb)); but by the closure rule rdfs3, this triple is in c(E); so I fails to satisfy c(E). The IEXT(I(
rdfs:domain
)) case is similar.Finally, suppose that I violates the condition on
rdfs:subClassOf
. Then for some x and y in IR, <x,y> is in IEXT(I(rdfs:subClassOf
)) but there is some z in ICEXT(x) but not in ICEXT(y). Again, since I is minimal, these entities must occur in triples in c(E) of the form respectivelyaaa
rdfs:subClassOf
bbb .ccc
rdf:type
aaa .where I(aaa)=x, I(bbb)=y and I(ccc)=z, and where the triple
ccc
rdf:type
bbb .is false in I; so I does not satisfy c(E) by a similar argument, using the closure rule rdfs9. Again, the case for a violation of the the condition on
rdfs:subPropertyOf
is similar.There is one final complication. The semantic conditions for rdfs:subClassOf require that if the class extension of aaa is a subset of that of bbb, then aaa rdfs:SubClassOf bbb is true. This condition however is not
QED.
RDFS Entailment Lemma. S rdfs-entails E iff the rdfs-closure of the merge of S simply entails E.
Proof. Exactly similar to proof of RDF entailment lemma. QED.
This document reflects the joint effort of the members of the RDF Core Working Group.
The use of an explicit extension mapping to allow self-application without violating the axiom of foundation was suggested by Chris Menzel. Peter Patel-Schneider found several errors in earlier drafts, and suggested several important technical improvements.
Changes from Working Draft March 2002