Re: RDF Semantics: rdfs entailment lemma from herman.ter.horst@philips.com on 2003-11-13 (www-rdf-comments@w3.org from October to December 2003)

From: <herman.ter.horst@philips.com>
Date: Thu, 13 Nov 2003 17:20:04 +0100
To: pat hayes <phayes@ihmc.us>
Cc: Dan Connolly <connolly@w3.org>, www-rdf-comments@w3.org
Message-ID: <OF0F1F2CED.1D22750C-ONC1256DDD.0058E175-C1256DDD.0059CAB1@diamond.philips.com>
>>>>This is a review of part of the RDF Semantics
>>>>document, editorial version LC2.5.
>>>>In this message I mainly focus on the rdfs entailment lemma.
>>>>
>>>>The proof of this lemma is based on the claim that
>>>>the RDFS Herbrand interpretation of an RDF graph is an
>>>>RDFS interpretation.
>>>>This claim seems to be false: the first condition for
>>>>RDF interpretations is not satisfied.
>>>>In order to show this, note that this condition amounts,
>>>>in this case, to the equivalence
>>>>     v in IP iff <v,Property> in IEXT(type).
>>>>Suppose that the graph G has triples
>>>>     v p l
>>>>and
>>>>     p range Property
>>>>where l is a plain literal.
>>>>By rule lg, the RDFS closure D of G contains the triple
>>>>     v p b
>>>>where b is allocated to l and where b = sur(l).
>>>>By rule rdfs3, D contains the triple
>>>>     b type Property = sur(l) type Property.
>>>>Therefore, <l,Property> in IEXT(type).
>>>>However, we cannot have  l in IP, since that would mean
>>>>that D contains the triple
>>>>     l type Property.
>>>
>>>Yes. The SH definition of IP should refer to the surrogate, just as
>>>the IEXT definition does, ie should read:
>>>
>>>IP<SH> = {x: D contains a triple sur(x) rdf:type rdf:Property . }
>>>
>>>then l can (indeed will) be in IP. that is, all the semantic
>>>conditions should be 'read off' from the graph via the surrogates.
>>>
>>>I will make this change.
>>
>>Clear.  This was the problem here.
>>Now the first condition on rdf-interpretations holds.
>>
>>>
>>>>===
>>>>
>>>>It should be made explicit what the domain and range of the
>>>>function sur are: I assume that these sets are both IR.
>>>
>>>The domain is IR and the range is a subset of IR consisting of
>>>vocabulary items. So the range is a subset of IR.
>>>
>>>>When this assumption is made explicit, there seems to be a
>>>>circularity in the definition of LV for the RDFS Herbrand
>>>>interpretation:
>>>>the definition of LV depends on sur, the definition of sur
>>>>depends on IR, the definition of IR depends on LV.
>>>>In view of this circularity, the definition of LV becomes
>>>>incomprehensible.  I believe that the definition of LV should
>>>be made explicit.
>>>
>>>OK, though I don't accept that it is ambiguous at present.
>>>
>>>>From the given definition, I would guess that the intention
>>>>is that LV is the union of five sets:
>>>>    strings
>>>>    pairs of strings and language tags
>>>
>>>ie plain literals, yes...
>>>
>>>>    XML values of well-typed XML literals in D
>>>>    {v in voc(D):  the triple  v type Literal  is in D }
>>>>    {v in voc(D): v a typed, non-XML literal such that
>>>>     b type Literal is in D, where b is the blank node allocated
>>>>     to v by rule lg }
>>>
>>>I do not think that this way of phrasing it appropriate. The central
>>  >intuition is that in a simple Herbrand interpretation, IR consists of
>>>the vocabulary items (including bnodes) in the graph, and the
>>>interpretation is simply read off the graph. Here we need to modify
>>>this by treating some vocabulary items as surrogates for more the
>>>special values required by the semantic conditions, and adding
>>>required items (all plain literals) which may not be in the graph;
>>>otherwise, the construction should mirror the simple Herbrand
>>>construction.
>>  >
>>>I propose to rephrase the definition as follows, modeled on the
>>>definition used in the RDF lemma:
>>  >
>>>-------
>>>If lll is a well-formed XML literal, let xml(lll) be the XML value of
>>>lll; and for each XML value xml(lll) of any well-formed XML literal
>>>lll in D, let sur(xml(lll)) be the blank node allocated to lll by
>>>rule lg; for any other literal lll in D, let sur(lll) be the blank
>>>node allocated to lll by rule lg, and extend sur to be the identity
>>>mapping on URI references and blank nodes in D. The domain of this
>>  >mapping is the universe IR<SH>, defined below, and the range contains
>>>only URI references and blank nodes which occur in D.
>>(Actually the document augments this with the definition that
>>sur is also the identity mapping on "other plain literals")
>>I still believe that the text contains a circularity that makes the
>>construction hard to understand:
>>sur is defined with reference to the universe IR "*defined below*",
>>then LV is defined in terms of sur, then IR is defined in terms
>>of LV.
>>It seems that your last paragraph I cite here contains the
>>solution to the problem:
>>Wouldn't it be clearer to
>>- first define IR, as you indicated, in four parts,
>>- then define sur: IR -> nodes(D) intersect (URIs u blankNodes)
>>- then define LV = plainLiterals u {x in IR: sur(x) type Literal in D}
>>- then IP = {x in IR: sur(x) type Property in D}
>>etc.?
>
>I have rewritten the text along these lines but without altering the 
>actual table (except as noted below). I hope this is now sufficiently 
>clear.
>

The text now includes all literals in IR (= the domain of sur).
It seems that the well-formed XML literals themselves should not
be in; only their values are.
Note, for example, that sur is not defined for them.

>>See further below.
>>
>>>-------
>>>
>>>>===
>>>>
>>>>"Define B(x) as before, then clearly [SH+B] satisfies D ..."
>>>>There seems to be a problem with this conclusion.
>>>>Making this explicit, it seems that B:blank(D)->IR
>>>>needs to be defined by
>>>>B(v)=xml(l) if v is a blank node allocated to the well-formed
>>>>XML literal l,
>>>>B(v)=l if v is a blank node allocated to a typed, non-XML
>>>>literal l,
>>>>otherwise B(v)=v.
>>>
>>>The wording is careless, forgive me. The intention was that the
>>>second case would include ALL other literals, ie non-well-typed XML,
>>>other typed and plain, if v has been allocated to that literal. I
>>>will spell this out more carefully:
>>>-----
>>>Define B(x) as follows: if x is a blank node allocated to a
>>>well-formed XML literal lll in D then B(x) = xml(lll); if it is
>>>allocated to any other literal lll in D then B(x)=lll; and otherwise
>>>B(x)=x.
>>>-----
>>>
>>>I am not sure if this point solves your next comment, because I am
>>>not sure what the force of the comment is.
>>
>>This indeed seems to be the required definition of B.
>>
>>[...]
>>
>>I have skipped here a rather long sequence of earlier remarks by me
>>and comments on those by you.  Much becomes clear.
>>
>>It seems that there remains only one problem in the proof of the
>>rdfs entailment lemma, which can be clearly localized.
>>
>>The claim on which the proof is based, that the RDFS Herbrand
>>interpretation of an RDF graph is an RDFS interpretation,
>>still seems to be false.
>>
>>To give an example, let A be a URI reference and let G
>>be the RDF graph consisting of just the two triples
>>   type domain A
>>   A type Class
>>Let l be a plain literal.
>
>OK.
>
>>Then A in IC, type in IP, <type,A> in IE(domain) and
>><l,Literal> in IEXT(type).
>>The semantic condition on domain shows that
>>l in CEXT(A), or <l,A> in IEXT(type)
>>Therefore D should contain the triple
>>   sur(l) type A
>>Since sur(l)=l, D should contain the triple
>>   l type A
>>which is clearly nonsense.
>>
>>It seems that similar small counterexamples to the claim
>>can be given with other vocabulary instead of domain:
>>range, subClassOf, subPropertyOf.
>>
>>Moreover, the last part of the proof of the rdfs entailment
>>lemma has a conclusion that cannot be justified:
>>"IEXTI(p) contains <SH+A(s),SH+A(o)>
>>i.e. D contains a triple
>>   sur(SH+A(s)) p sur(SH+A(o))"
>>The problem with this conclusion "i.e." is that p can be type
>>and <SH+A(s),SH+A(o)> can be <l,Literal> or <l,Resource>
>>for some plain literal l.
>
>not in D, that is, I presume is your point.  Yes, point taken.
>
>>The proof of semantic conditions for domain, range, subClassOf
>>and subPropertyOf make a similar unjustified step.
>>
>>It seems that these problems can be solved by slightly
>>generalizing the notion of simple interpretations and
>>thereby also the other kinds of interpretations
>>(rdf, rdfs, D).
>>The assumption that LV contains all plain literals might
>>be dropped.
>
>Indeed. This does greatly simplify everything and, as you say, is not 
>actually used anywhere. At the time it was written there seemed to be 
>an expository value in considering plain literals to be 'outside' the 
>vocabulary, but this small value, if any, is clearly outweighed by 
>the formal awkwardness that results.  I will refrain from drawing the 
>obvious moral, since it resounds to my disadvantage.
>
>On reflection I have made the following changes:
>
>1. A plain literal is considered a name (so vocabularies may contain 
>plain literals).
>2. The definition of simple interpretation of V requires that LV 
>contain plain literals *in V*
>3. The first two semantic conditions on plain literals in a simple 
>interpretation refer to plain literals *in V*.
>4. The definitions of LV in the three Herbrand constructions are 
>modified appropriately to refer to plain literals in the appropriate 
>graph (respectively in G, C and D)

Not in the third one: there LV is {x in IR : sur(x) t Literal in D}
as we noted

>5. As you suggest, the special case of IEXT<SH> is dropped.
>
>I also noted a slip in the definition of the mapping B in the RDF 
>lemma proof and corrected it:
>
>Define a mapping B on blank nodes in C as follows: B(x)=xml(lll) if x 
>is allocated to a well-formed XML literal lll, otherwise B(x)=x

Correct.  I had already understood it in this way.

>
>>   It seems that the only thing that is essential
>>to assume about plain literals is that they denote themselves.
>>It seems that nowhere in the entire document the assumption
>>is used that an arbitrary string or pair consisting of string
>>and literal is actually contained in LV.  XMLLiterals and
>>other typed literals are not given this priviliged treatment.
>>The assumption does not seem match well with a clean and correct
>>development of the Herbrand construction in the case of RDFS.
>
>Agreed.
>
>>I consider the RDFS entailment lemma to be proved when
>>this change would be made to the definition, and when,
>>moreover, the following naturally following adjustments
>>are made to the construction of the RDFS Herbrand
>>interpretation:
>>- drop the literals outside the vocabulary of D
>>from IR (this is the second part of the definition mentioned
>>above, each of these literals is plain)
>>- drop the plain literals from LV, the definition would
>>become completely analogous to that of IP:
>>   LV = {x in IR : sur(x) type Literal in D}
>>- in the definition of IEXT drop the special case IEXT(type),
>>which causes the horrible problems I just mentioned.
>>
>
>Done. So it is proved :-)
>
>Pat
>
>>  >Pat
>>>
>>>PS. the changes made described above are now visible in the copy on my
>>website:
>>>
>>>http://www.ihmc.us/users/phayes/RDF_Semantics_LC2.5.html
>>>
>>>I have also added an explanatory paragraph just before the proof of
>>>the RDF entailment lemma, and some explanatory prose in the proof of
>>>the RDFS entailment lemma concerning the role of literal surrogates.
>>>
>>>Please let me know if this response adequately deals with the issues 
you
>>raise.
>>>
>>>Pat
>>>
>>>>
>>>>
>>>>Herman ter Horst
>>>
>>>
>>>--
>>>---------------------------------------------------------------------
>>>IHMC            (850)434 8903 or (650)494 3973   home
>>>40 South Alcaniz St.            (850)202 4416   office
>>>Pensacola                                               (850)202 4440 
fax
>>>FL 32501                                                (850)291 0667
>>cell
>>>phayes@ihmc.us       http://www.ihmc.us/users/phayes
>>>
>>>
>>
>>Herman
>
>
>-- 
>---------------------------------------------------------------------
>IHMC            (850)434 8903 or (650)494 3973   home
>40 South Alcaniz St.            (850)202 4416   office
>Pensacola                                               (850)202 4440 fax
>FL 32501                                                (850)291 0667 
cell
>phayes@ihmc.us       http://www.ihmc.us/users/phayes
>
>
Received on Thursday, 13 November 2003 11:22:54 UTC