Re: model theory for RDF/S from Peter F. Patel-Schneider on 2001-09-28 (www-rdf-logic@w3.org from September 2001)

From: Peter F. Patel-Schneider <pfps@research.bell-labs.com>
Date: Thu, 27 Sep 2001 23:36:31 -0400
To: phayes@ai.uwf.edu
Cc: www-rdf-logic@w3.org
Message-Id: <20010927233631I.pfps@research.bell-labs.com>
From: Pat Hayes <phayes@ai.uwf.edu>
Subject: Re: model theory for RDF/S
Date: Thu, 27 Sep 2001 19:42:28 -0500

[...]

> >  R, is a four-tuple (that can be considered to be a
> >partially node labeled, totally edge labeled, directed graph)
> >		< N, E, LN, LE >
> >	where N is the set of nodes in the graph
> >	      LN :(partial) N -> URI u L gives labels for nodes
> >	      LE :(total) E -> URI gives labels for edges
> >	      E <= N' x N is the set of edges in the graph
> >		where N' = { n : LN(n) in URI }
> 
>   E <= N' x N  does not allow for two arcs with different labels 
> between the same two nodes. Better to have E be a set with a mapping 
> into N' x N (or mappings into N' and N).

I agree that I was wrong here, as this treatment does not allow for
multiple relations between the same nodes.

> But why bother with all this? The concept of a labelled graph is 
> standard and uncontroversial, so this is one place where we can say 
> some mathematics in a reasonably intuitive way without sacrificing 
> either precision or readability by non-mathematicians.

I thought that a labeled graph only allows at most one edge between a pair
of nodes, so labeled graphs would not solve this problem.  Certainly I've
seen lots of treatments where labeled graphs are defined in this fashion.

[...]

> >   [NB: It is explicit here that the vocabulary of the interpretation can
> >        have ``names'' that do not appear in the graph.  Pat's theory is
> >        vague on this point.]
> 
> I protest, it is quite clear. An interpretation is defined on a set V 
> . Of course V  may contain things that are not in the set of graph 
> labels (see section 1..1, third paragraph.) In fact, there can even 
> be graph labels that are not in V, and the interpretation will make 
> any triples containing them false.
> 
> There is a subtle but important aspect of mathematical style here. We 
> don't want to *define* an interpretation to only apply to a 
> vocabulary associated with a graph, since we may well want to discuss 
> whether or not (or the extent to which) one graph is true or false in 
> an interpretation of a different graph. Such matters come up 
> immediately when proving things about skolemization or instances or 
> mergings, for example. That is why I defined an interpretation 
> relative to a vocabulary rather than to a graph.

Agreed.  The potential confusion that I saw was the two different notions
of vocabulary, one defined as the URIs in a graph and one used in
interpretations.  This is, at best, only a minor point.

> The terminology LE(<f,g>) presumes that an edge can be defined as a 
> pair of its endpoints, which isn't generally true. It would be better 
> to define a graph as a set E of edges - intuitively triples, since 
> they define their endpoints - (with LE as here) and with an EE (Edge 
> Ends) mapping to a pair of nodes. Then the condition is:
> 
> 3. if e is in E
>      then I(e)= true if IS(LE(e)) in IP and EE(e)=<f,g> and 
> <I(f),I(g)> in IEXT(IS(LE(e)))
>       I(E)=false otherwise.
> 
> or possibly to have a pair of mappings S  and O from edges to nodes, 
> and then one gets
> 
> 3. if e is in E
>     then I(e)=true if IS(LE(e)) in IP and <I(S(e)),I(O(e))> in IEXT(IS(LE(e)))
>     I(E)=false otherwise
> 
> But all of these seem to me to be needlessly complex, and also 
> confusing, in that the various mappings all have different meanings, 
> eg in the last term, LE is a syntactic graph labelling and IS a 
> semantic interpretation.

Yes, I goofed here.  

> Apart from the tighter use of mathematical terminology and style, the 
> only significant change here is the emphasis on assigning 
> interpretations to nodes rather than to their labels, which 
> introduces the LN and LE mappings into the semantic equations. While 
> strictly correct, I feel this is overly pedantic. The whole utility 
> of the graph syntax is that the tidiness of the graph means that 
> labelled nodes can be identified by their labels: each node label 
> occurs on only one node. So on the labelled nodes, nodes and their 
> labels are in exact 1:1 correspondence , and it is a familiar and 
> conventional 'abuse' of terminology, used throughout working 
> mathematics under such circumstance, to use one of the corresponding 
> things to refer to the other (eg not bothering to distinguish 
> notationally between a singleton set and its only member, or between 
> an algebra and its underlying carrier set). Since the vocabulary *is* 
> the labels rather than the nodes, to phrase the model theory in the 
> way I did seems to me to be clearer and more intuitive while being 
> just as precise, and I would propose to keep it that way.
> 
> If we are dealing with untidy graphs, the graph syntax has no 
> structural advantages over a lexical syntax, and it would then be 
> preferable to simply attach the model theory directly to the 
> N-triples notation (which in an earlier version of the model theory 
> is what we in fact did, but that had problems of its own.).

There may be advantages to untidy graphs when looking at complex strategies
for literals.  In particular, in the model theory for DAML+OIL, merging
nodes whose labels are literals may change the meaning of the graph.  (Of
course the model theory for DAML+OIL doesn't use graphs, but if it did,
such merging would not necessarily be meaning-preserving.)

[...]

> All this pussyfooting about literals would be greatly eased if RDF 
> simply declared that all labels were URIs and that literals were a 
> particular syntactic subclass of URIs with a fixed interpretation. 
> Then there would be no need to introduce any syntactic distinction, 
> LV could be a subset of IR, and all would be more straightforward. 
> The correct treatment of literals in RDF is being discussed now, and 
> it's not clear to me what the WG will decide on this issue; there are 
> many agendas to be considered, not just model-theoretic elegance. But 
> I am confident that the RDF model theory can be fitted onto whatever 
> is decided, without serious alterations.
> 
> However, Peter, a question for a professional Description Logician: 
> this would allow literals to be assigned non-literal property values 
> by an RDF assertion. Wouldn't that break DAML+OIL?

DAML+OIL depends somewhat on the separation between resources and
literals.  Some Description Logics may break severely if their separation
between abstract (resources) and concrete (literals) domains is breached.

[...]

> >Taking care of rdf:type:
> >
> >A core RDF interpretation, i.e., RDF without reification or containers, is
> >an interpretation over a vocabularly that includes rdf:type with the
> >following extra conditions
> >
> >      1. IS(rdf:type) is in IP
> >      2. IEXT(IS(rdf:type)) <= IR x IR
> 
> Is there any real need for condition 2 here in RDF? 

I don't know.  The condition is directly stated in M&S.  It says that
literals cannot have instances, which is probably a good thing.  I'm not sure
what the instance of "2" could be.

> I hope it can be 
> avoided, since it would mean that a triple
> 
> aaa rdf:type LLL .
> 
> where LLL is a literal, comes close to being a contradiction. Right 
> now, all it implies is that IR and LV overlap, but if anyone were to 
> ever claim that they didn't overlap, then it would be. I don't like 
> having land-mines concealed in the model theory.

I'm not sure how you get this implication, nor am I sure why literals need
to have instances.

> Pat

peter
Received on Thursday, 27 September 2001 23:37:53 UTC