Re: RDF Semantics: rdfs entailment lemma from herman.ter.horst@philips.com on 2003-11-12 (www-rdf-comments@w3.org from October to December 2003)

From: <herman.ter.horst@philips.com>
Date: Wed, 12 Nov 2003 15:08:20 +0100
To: pat hayes <phayes@ihmc.us>
Cc: www-rdf-comments@w3.org
Message-ID: <OFB4DE55C9.4D365A10-ONC1256DDC.002CDAA1-C1256DDC.004DBBB1@diamond.philips.com>
>>This is a review of part of the RDF Semantics
>>document, editorial version LC2.5.
>>In this message I mainly focus on the rdfs entailment lemma.
>>
>>The proof of this lemma is based on the claim that
>>the RDFS Herbrand interpretation of an RDF graph is an
>>RDFS interpretation.
>>This claim seems to be false: the first condition for
>>RDF interpretations is not satisfied.
>>In order to show this, note that this condition amounts,
>>in this case, to the equivalence
>>    v in IP iff <v,Property> in IEXT(type).
>>Suppose that the graph G has triples
>>    v p l
>>and
>>    p range Property
>>where l is a plain literal.
>>By rule lg, the RDFS closure D of G contains the triple
>>    v p b
>>where b is allocated to l and where b = sur(l).
>>By rule rdfs3, D contains the triple
>>    b type Property = sur(l) type Property.
>>Therefore, <l,Property> in IEXT(type).
>>However, we cannot have  l in IP, since that would mean
>>that D contains the triple
>>    l type Property.
>
>Yes. The SH definition of IP should refer to the surrogate, just as 
>the IEXT definition does, ie should read:
>
>IP<SH> = {x: D contains a triple sur(x) rdf:type rdf:Property . }
>
>then l can (indeed will) be in IP. that is, all the semantic 
>conditions should be 'read off' from the graph via the surrogates.
>
>I will make this change.

Clear.  This was the problem here.
Now the first condition on rdf-interpretations holds.

>
>>===
>>
>>It should be made explicit what the domain and range of the
>>function sur are: I assume that these sets are both IR.
>
>The domain is IR and the range is a subset of IR consisting of 
>vocabulary items. So the range is a subset of IR.
>
>>When this assumption is made explicit, there seems to be a
>>circularity in the definition of LV for the RDFS Herbrand
>>interpretation:
>>the definition of LV depends on sur, the definition of sur
>>depends on IR, the definition of IR depends on LV.
>>In view of this circularity, the definition of LV becomes
>>incomprehensible.  I believe that the definition of LV should
>be made explicit.
>
>OK, though I don't accept that it is ambiguous at present.
>
>>From the given definition, I would guess that the intention
>>is that LV is the union of five sets:
>>   strings
>>   pairs of strings and language tags
>
>ie plain literals, yes...
>
>>   XML values of well-typed XML literals in D
>>   {v in voc(D):  the triple  v type Literal  is in D }
>>   {v in voc(D): v a typed, non-XML literal such that
>>    b type Literal is in D, where b is the blank node allocated
>>    to v by rule lg }
>
>I do not think that this way of phrasing it appropriate. The central 
>intuition is that in a simple Herbrand interpretation, IR consists of 
>the vocabulary items (including bnodes) in the graph, and the 
>interpretation is simply read off the graph. Here we need to modify 
>this by treating some vocabulary items as surrogates for more the 
>special values required by the semantic conditions, and adding 
>required items (all plain literals) which may not be in the graph; 
>otherwise, the construction should mirror the simple Herbrand 
>construction.
>
>I propose to rephrase the definition as follows, modeled on the 
>definition used in the RDF lemma:
>
>-------
>If lll is a well-formed XML literal, let xml(lll) be the XML value of 
>lll; and for each XML value xml(lll) of any well-formed XML literal 
>lll in D, let sur(xml(lll)) be the blank node allocated to lll by 
>rule lg; for any other literal lll in D, let sur(lll) be the blank 
>node allocated to lll by rule lg, and extend sur to be the identity 
>mapping on URI references and blank nodes in D. The domain of this 
>mapping is the universe IR<SH>, defined below, and the range contains 
>only URI references and blank nodes which occur in D.
(Actually the document augments this with the definition that
sur is also the identity mapping on "other plain literals")

I still believe that the text contains a circularity that makes the
construction hard to understand:
sur is defined with reference to the universe IR "*defined below*",
then LV is defined in terms of sur, then IR is defined in terms
of LV.
It seems that your last paragraph I cite here contains the 
solution to the problem:
Wouldn't it be clearer to 
- first define IR, as you indicated, in four parts, 
- then define sur: IR -> nodes(D) intersect (URIs u blankNodes)
- then define LV = plainLiterals u {x in IR: sur(x) type Literal in D}
- then IP = {x in IR: sur(x) type Property in D}
etc.?

See further below.

>-------
>
>>===
>>
>>"Define B(x) as before, then clearly [SH+B] satisfies D ..."
>>There seems to be a problem with this conclusion.
>>Making this explicit, it seems that B:blank(D)->IR
>>needs to be defined by
>>B(v)=xml(l) if v is a blank node allocated to the well-formed
>>XML literal l,
>>B(v)=l if v is a blank node allocated to a typed, non-XML
>>literal l,
>>otherwise B(v)=v.
>
>The wording is careless, forgive me. The intention was that the 
>second case would include ALL other literals, ie non-well-typed XML, 
>other typed and plain, if v has been allocated to that literal. I 
>will spell this out more carefully:
>-----
>Define B(x) as follows: if x is a blank node allocated to a 
>well-formed XML literal lll in D then B(x) = xml(lll); if it is 
>allocated to any other literal lll in D then B(x)=lll; and otherwise 
>B(x)=x.
>-----
>
>I am not sure if this point solves your next comment, because I am 
>not sure what the force of the comment is.

This indeed seems to be the required definition of B.

[...]

I have skipped here a rather long sequence of earlier remarks by me 
and comments on those by you.  Much becomes clear.

It seems that there remains only one problem in the proof of the
rdfs entailment lemma, which can be clearly localized.

The claim on which the proof is based, that the RDFS Herbrand 
interpretation of an RDF graph is an RDFS interpretation,
still seems to be false.

To give an example, let A be a URI reference and let G
be the RDF graph consisting of just the two triples
  type domain A
  A type Class
Let l be a plain literal.
Then A in IC, type in IP, <type,A> in IE(domain) and
<l,Literal> in IEXT(type).
The semantic condition on domain shows that 
l in CEXT(A), or <l,A> in IEXT(type)
Therefore D should contain the triple
  sur(l) type A
Since sur(l)=l, D should contain the triple
  l type A
which is clearly nonsense.

It seems that similar small counterexamples to the claim
can be given with other vocabulary instead of domain:
range, subClassOf, subPropertyOf.

Moreover, the last part of the proof of the rdfs entailment
lemma has a conclusion that cannot be justified:
"IEXTI(p) contains <SH+A(s),SH+A(o)> 
i.e. D contains a triple
  sur(SH+A(s)) p sur(SH+A(o))"
The problem with this conclusion "i.e." is that p can be type
and <SH+A(s),SH+A(o)> can be <l,Literal> or <l,Resource>
for some plain literal l.

The proof of semantic conditions for domain, range, subClassOf
and subPropertyOf make a similar unjustified step.

It seems that these problems can be solved by slightly 
generalizing the notion of simple interpretations and 
thereby also the other kinds of interpretations 
(rdf, rdfs, D).
The assumption that LV contains all plain literals might
be dropped.  It seems that the only thing that is essential
to assume about plain literals is that they denote themselves.
It seems that nowhere in the entire document the assumption
is used that an arbitrary string or pair consisting of string 
and literal is actually contained in LV.  XMLLiterals and
other typed literals are not given this priviliged treatment.
The assumption does not seem match well with a clean and correct
development of the Herbrand construction in the case of RDFS.

I consider the RDFS entailment lemma to be proved when
this change would be made to the definition, and when, 
moreover, the following naturally following adjustments 
are made to the construction of the RDFS Herbrand 
interpretation:
- drop the literals outside the vocabulary of D
from IR (this is the second part of the definition mentioned
above, each of these literals is plain)
- drop the plain literals from LV, the definition would
become completely analogous to that of IP:
  LV = {x in IR : sur(x) type Literal in D}
- in the definition of IEXT drop the special case IEXT(type),
which causes the horrible problems I just mentioned.


>Pat
>
>PS. the changes made described above are now visible in the copy on my 
website:
>
>http://www.ihmc.us/users/phayes/RDF_Semantics_LC2.5.html
>
>I have also added an explanatory paragraph just before the proof of 
>the RDF entailment lemma, and some explanatory prose in the proof of 
>the RDFS entailment lemma concerning the role of literal surrogates.
>
>Please let me know if this response adequately deals with the issues you 
raise.
>
>Pat
>
>>
>>
>>Herman ter Horst
>
>
>-- 
>---------------------------------------------------------------------
>IHMC            (850)434 8903 or (650)494 3973   home
>40 South Alcaniz St.            (850)202 4416   office
>Pensacola                                               (850)202 4440 fax
>FL 32501                                                (850)291 0667 
cell
>phayes@ihmc.us       http://www.ihmc.us/users/phayes
>
>

Herman
Received on Wednesday, 12 November 2003 09:09:11 UTC