- From: Pat Hayes <phayes@ai.uwf.edu>
- Date: Thu, 27 Sep 2001 19:42:28 -0500
- To: "Peter F. Patel-Schneider" <pfps@research.bell-labs.com>
- Cc: www-rdf-logic@w3.org
>I thought I might give a stab at cleaning up some of the problems that I >see with Pat's model theory. Here are the results for RDF (without >reification or containers). Note that this should be considered a >derivative work from Pat's work, and not a completely separate model >theory. > >I have several themes here. First, I wanted to be very formal, so no >misconceptions can arise. To this end, I have produced a very sparse >document. (However, I have had to use some non-standard symbology for the >set-theoretic relationships. Make the obvious fixes.) Second, I wanted >to stick as close as possible to the ``Formal Model for RDF''. This drove >some of the changes from Pat's model theory. Third, I wanted to annotate >some of the issues and choices in the model theory. > > >Syntax: > >URI is the collection of URI names, i.e., some collection of strings > > [NB: This ignores all aspects of the structure of URIs.] > >L is the collection of literals, i.e., some collection of strings, >disjoint from URI > > [NB: This ignores non-string literals, but they wouldn't cause > any more problems.] > > [NB: I'm sticking with graphs, even though they have some problems for > RDF. Triples have their own problems with respect to RDF.] > >A basic untidy RDF graph, The major utility of the graph syntax arises from the tidiness property, so I would prefer to build it into the definition. > R, is a four-tuple (that can be considered to be a >partially node labeled, totally edge labeled, directed graph) > < N, E, LN, LE > > where N is the set of nodes in the graph > LN :(partial) N -> URI u L gives labels for nodes > LE :(total) E -> URI gives labels for edges > E <= N' x N is the set of edges in the graph > where N' = { n : LN(n) in URI } E <= N' x N does not allow for two arcs with different labels between the same two nodes. Better to have E be a set with a mapping into N' x N (or mappings into N' and N). But why bother with all this? The concept of a labelled graph is standard and uncontroversial, so this is one place where we can say some mathematics in a reasonably intuitive way without sacrificing either precision or readability by non-mathematicians. > [NB: This accounts for literals not being allowed as labels of edges, nor > as the labels of nodes that are heads of edges, but does not > account for edge labels being properties. It would be possible to > make the latter distinction here.] > > [NB: I don't think that Pat's definition of RDF graphs has the > requirement that heads of edges cannot be labeled with literals.] Right, that was a slip, as I've already acknowledged. Here is the suggested rewording (not yet approved by the WG): "An RDF graph is a partially labeled directed graph satisfying the following conditions: arcs are labelled with URIs; nodes from which arcs emerge are either unlabelled, or labelled with URIs; all other nodes are either unlabelled, or labelled with URIs or literals; and distinct labeled nodes have different labels. " >An basic untidy RDF graph is ground if LN is a total function on N. >An basic untidy RDF graph is tidy an untidy graph is tidy?? Yech. >(also called a basic RDF graph) if LN is >injective (not necessarily total). These are the same notions as those expressed above in English, but using more mathematical terminology. We decided on a less mathematical style deliberately, in view of the likely audience. If re-written for professional mathematical logicians, the entire model theory could be described in a single page. (1.5 pages using ASCII) >Literal Values: > >LV is a non-empty set, currently the set of strings. >XL : L -> LV, giving meaning to literals > > [NB: This ignores many aspects of the intended meaning of strings, > such as whether the mapping is surjective or injective.] > > >Interpretations: > >Let V be a subset of URI. > > [NB: Does V have to be non-empty, does it have to be finite? No. (Though maybe it should indeed always include rdf:type and hence be nonempty, as you pointed out.) Not necessarily finite, though, which will important later. >So far > there are no such requirements, but we have to be careful that this > is OK, and that we don't make unnecessary requirements on V.] >An interpretation, I, on a vocabulary V, is a four-tuple < IR, IP, IEXT, IS> > where IR is a non-empty set, called resources Subtle terminological point. I didn't want to say that IR is *all* the resources, or that the members of IR are called 'resources' by definition. This word, used in this context, is a technical term of art which as far as I am concerned belongs to the W3C, or maybe to Tim Berners-Lee, or someone; but in any case not to me. So my position is, I'm not *defining* it here, and moreover I don't really know what it means; but, the M&S says that RDF is about these things, so the model theory goes along with that and declares that the universe shall consist of them, whatever they are. I tried to make the point clear in the introductory comments of the model theory document. > IP <= IR, called properties > IEXT : IP -> powerset ( IR x (IR u LV) ) > IS : V -> IR > > [NB: There is no requirement that IR and LV be disjoint nor is there any > requirement that they be subsets of one another. We have to be > careful that this is OK, and that we don't accidentally disturb this > (non-)relationship.] Right, and I believe that the model theory does not disturb it. >Interpretations of ground basic untidy RDF graphs: > >Let R = < N, E, LN, LE > be a ground basic untidy RDF graph. > > [NB: There appears to be no harm in allowing untidy graphs here, but this > could easily be restricted to tidy graphs.] > >An interpretation I on the vocabulary V with > { LN(n) | n in N } ^ URI <= V > { LN(e) | e in E } <= V >is extended to R as follows > > [NB: I am being a bit vague here in just what is the domain of I.] > > [NB: It is explicit here that the vocabulary of the interpretation can > have ``names'' that do not appear in the graph. Pat's theory is > vague on this point.] I protest, it is quite clear. An interpretation is defined on a set V . Of course V may contain things that are not in the set of graph labels (see section 1..1, third paragraph.) In fact, there can even be graph labels that are not in V, and the interpretation will make any triples containing them false. There is a subtle but important aspect of mathematical style here. We don't want to *define* an interpretation to only apply to a vocabulary associated with a graph, since we may well want to discuss whether or not (or the extent to which) one graph is true or false in an interpretation of a different graph. Such matters come up immediately when proving things about skolemization or instances or mergings, for example. That is why I defined an interpretation relative to a vocabulary rather than to a graph. >1. if LN(n) is defined and in L then I(n) = XL(LN(n)) > >2. if LN(n) is defined and in V then I(n) = IS(LN(n)) Seems to me to be pointless to attach the interpretations to the nodes when nodes and labels are 1:1 in a ground graph. There is no need to introduce the LN and LE mappings into the equations, and they clutter up the equations to no useful purpose. > [NB: These two give an interpretation for all nodes in R.] > [NB: The ``defined'' part is used later.] > >3. if <f,g> is in E > then I(E) = true if IS(LE(<f,g>)) in IP and <I(f),I(g)> in >IEXT(IS(LE(<f,g>))), > I(E) = false otherwise Shouldn't that be I(<f,g>) =true if..... ? The terminology LE(<f,g>) presumes that an edge can be defined as a pair of its endpoints, which isn't generally true. It would be better to define a graph as a set E of edges - intuitively triples, since they define their endpoints - (with LE as here) and with an EE (Edge Ends) mapping to a pair of nodes. Then the condition is: 3. if e is in E then I(e)= true if IS(LE(e)) in IP and EE(e)=<f,g> and <I(f),I(g)> in IEXT(IS(LE(e))) I(E)=false otherwise. or possibly to have a pair of mappings S and O from edges to nodes, and then one gets 3. if e is in E then I(e)=true if IS(LE(e)) in IP and <I(S(e)),I(O(e))> in IEXT(IS(LE(e))) I(E)=false otherwise But all of these seem to me to be needlessly complex, and also confusing, in that the various mappings all have different meanings, eg in the last term, LE is a syntactic graph labelling and IS a semantic interpretation. > [NB: This cleans up the meaning of edges whose labels are not mapped to > properties. It seems to me that there is nothing to be cleaned up. There is no need to explicitly mention whether or not IS is defined on a label; it it is not, then the I(e)=true if... condition is evidently not satisfied (since it mentions membership in a set which is the value of that mapping, so if the mapping has no value then there is no such set), so the truth-conditions establish that I(e)=false. This usage of 'otherwise' is normal style in stating truth-conditions, but I will add a sentence (and maybe an example) to the document to make the point clear. Apart from the tighter use of mathematical terminology and style, the only significant change here is the emphasis on assigning interpretations to nodes rather than to their labels, which introduces the LN and LE mappings into the semantic equations. While strictly correct, I feel this is overly pedantic. The whole utility of the graph syntax is that the tidiness of the graph means that labelled nodes can be identified by their labels: each node label occurs on only one node. So on the labelled nodes, nodes and their labels are in exact 1:1 correspondence , and it is a familiar and conventional 'abuse' of terminology, used throughout working mathematics under such circumstance, to use one of the corresponding things to refer to the other (eg not bothering to distinguish notationally between a singleton set and its only member, or between an algebra and its underlying carrier set). Since the vocabulary *is* the labels rather than the nodes, to phrase the model theory in the way I did seems to me to be clearer and more intuitive while being just as precise, and I would propose to keep it that way. If we are dealing with untidy graphs, the graph syntax has no structural advantages over a lexical syntax, and it would then be preferable to simply attach the model theory directly to the N-triples notation (which in an earlier version of the model theory is what we in fact did, but that had problems of its own.). >It might also be possible to make graphs with such > edges ill-formed in some way.] I don't like that idea, it mixes up syntax and semantics inappropriately. Those 'undefined' URIs may be defined in a different interpretation. >4. I(R) = false if I(e) = false for some e in E, > I(R) = true otherwise > > >Interpretations of basic untidy RDF graphs: > > [NB: I'm writing out extended interpretations completely for purposes of > being completely pedantic. Right. I think this style is way too pendantic for most readers. The simpler language in the model theory document says the same thing much more concisely and isn't likely to be misunderstood by anyone. >An extended interpretation on a vocabularly V is a two-tuple > <<IR,IP,IEXT,IS>,A'> >where <IR,IP,IEXT,IS> is an interpretation on V > and A' is a mapping to IR. > > [NB: This means that unnamed nodes have to be resources, not literals > that are not resources. Right. Actually, unnamed nodes have to *denote* resources, to be very picky. Some literal values may well be resources, of course; the model theory is agnostic on this. >This may have consequences!] It would if we allowed literals to be instances of blank nodes, but in section 3.1 I was careful to only allow URIs. (It does mean that one needs to be sensitive about substitution when considering checking entailment, but that is a computational issue; still, I could add a warning comment about this point, and I will do so in the next update.) All this pussyfooting about literals would be greatly eased if RDF simply declared that all labels were URIs and that literals were a particular syntactic subclass of URIs with a fixed interpretation. Then there would be no need to introduce any syntactic distinction, LV could be a subset of IR, and all would be more straightforward. The correct treatment of literals in RDF is being discussed now, and it's not clear to me what the WG will decide on this issue; there are many agendas to be considered, not just model-theoretic elegance. But I am confident that the RDF model theory can be fitted onto whatever is decided, without serious alterations. However, Peter, a question for a professional Description Logician: this would allow literals to be assigned non-literal property values by an RDF assertion. Wouldn't that break DAML+OIL? > >Let R = < N, E, LN, LE > be a basic untidy RDF graph and >let U = { n in N | LN(n) is not defined } > >An extended interpretation I = <<IR,IP,IEXT,IS>,A'> on the vocabulary V with > { LN(n) | n in N } ^ URI <= V > { LN(e) | e in E } <= V > and the domain of A' = U >is extended to R as follows > >0. if LN(n) is not defined I(n) = A'(n) > >1. if LN(n) is defined and in L then I(n) = XL(LN(n)) > >2. if LN(n) is defined and in V then I(n) = IS(LN(n)) > >3. if <f,g> is in E > then I(E) = true if IS(LE(<f,g>)) in IP and <I(f),I(g)> in >IEXT(IS(LE(<f,g>))), > I(E) = false otherwise > >4. I(R) = false if I(e) = false for some e in E, > I(R) = true otherwise > >Let R = < N, E, LN, LE > be a basic untidy RDF graph and >let U = { n in N | LN(n) is not defined } > >An interpretation I = <IR,IP,IEXT,IS> on the vocabulary V with > { LN(n) | n in N } ^ URI <= V > { LN(e) | e in E } <= V >is extended to R as follows: > > I(R) = true if there is some mapping A' from U to IR such that ><I,A'>(R) = true, > I(R) = false otherwise > > >An interpretation I with vocabulary V is called a model of a >basic untidy RDF graph R = <N,E,LN,LE> iff >1. { LN(n) | n in N } ^ URI <= V > { LN(e) | e in E } <= V, and >2. I(R) = true > > > >Taking care of rdf:type: > >A core RDF interpretation, i.e., RDF without reification or containers, is >an interpretation over a vocabularly that includes rdf:type with the >following extra conditions > > 1. IS(rdf:type) is in IP > 2. IEXT(IS(rdf:type)) <= IR x IR Is there any real need for condition 2 here in RDF? I hope it can be avoided, since it would mean that a triple aaa rdf:type LLL . where LLL is a literal, comes close to being a contradiction. Right now, all it implies is that IR and LV overlap, but if anyone were to ever claim that they didn't overlap, then it would be. I don't like having land-mines concealed in the model theory. > >Now for the claims: > >All these sorts of claims have to be backed up with theorems and lemmas >like the ones that Pat has. I would go even further than Pat has, >extending to RDF query languages, at least whenever such become >well-defined. In an earlier draft I did indeed do this, but we decided to remove it since RDF query languages are not yet stable. (The comments on detecting entailment by variable substitution in consequents, at the end of section 3, are what is left of of that material, summarized. ) I agree that the model theory should cover queries eventually. >Claim 1: > >A core RDF interpretation that is a model of a basic untidy RDF graph R >contains everything (and more) that is in the intended core RDF (i.e., RDF >without reification or containers) meaning of R. In other words, a model >contains all the information implicitly (or explicity) in the graph, and >maybe more, and contradicts nothing that is implicitly (or explicitly) in >the graph. I'm not quite sure what this means. Can you express the notion of 'contains all the information' in model-theoretic terms? > >Claim 2: > >A core RDF interpretation that is not a model of a basic untidy RDF graph R >has something that is not compatible the intended core RDF meaning of R. >In other words, a non-model is missing or incorrect on something that is >implicitly (or explicitly) in the graph. This is what the anonymity lemmas try to make precise. > >Claim 3: > >For every basic untidy RDF graph R there is a core RDF interpretation that >captures exactly the closure of the intended core RDF meaning of R and that >is a model for R. That is, roughly, that there is a model that makes >everything implicitly (or explicitly) in the graph true, and everything >else false. Right. That is the strong Herbrand lemma. > [NB: This claim is not true for more-powerful representation systems.] Oh, indeed. Part of my motivation for stating these lemmas was to show how different RDF is from more expressive languages. (I didn't feel it was appropriate to spend a lot of prose on this in the document, however. Anyone who knows about the other systems will probably find it kind of obvious in any case.) Pat -- --------------------------------------------------------------------- IHMC (850)434 8903 home 40 South Alcaniz St. (850)202 4416 office Pensacola, FL 32501 (850)202 4440 fax phayes@ai.uwf.edu http://www.coginst.uwf.edu/~phayes
Received on Thursday, 27 September 2001 20:42:40 UTC