Re: RDF Semantics: rdfs entailment lemma from pat hayes on 2003-11-12 (www-rdf-comments@w3.org from October to December 2003)

From: pat hayes <phayes@ihmc.us>
Date: Wed, 12 Nov 2003 12:59:58 -0600
To: herman.ter.horst@philips.com
Cc: www-rdf-comments@w3.org, Dan Connolly <connolly@w3.org>
Message-Id: <p06001f02bbd8040ff9af@[10.1.31.1]>
>  >>This is a review of part of the RDF Semantics
>>>document, editorial version LC2.5.
>>>In this message I mainly focus on the rdfs entailment lemma.
>>>
>>>The proof of this lemma is based on the claim that
>>>the RDFS Herbrand interpretation of an RDF graph is an
>>>RDFS interpretation.
>>>This claim seems to be false: the first condition for
>>>RDF interpretations is not satisfied.
>>>In order to show this, note that this condition amounts,
>>>in this case, to the equivalence
>>>     v in IP iff <v,Property> in IEXT(type).
>>>Suppose that the graph G has triples
>>>     v p l
>>>and
>>>     p range Property
>>>where l is a plain literal.
>>>By rule lg, the RDFS closure D of G contains the triple
>>>     v p b
>>>where b is allocated to l and where b = sur(l).
>>>By rule rdfs3, D contains the triple
>>>     b type Property = sur(l) type Property.
>>>Therefore, <l,Property> in IEXT(type).
>>>However, we cannot have  l in IP, since that would mean
>>>that D contains the triple
>>>     l type Property.
>>
>>Yes. The SH definition of IP should refer to the surrogate, just as
>>the IEXT definition does, ie should read:
>>
>>IP<SH> = {x: D contains a triple sur(x) rdf:type rdf:Property . }
>>
>>then l can (indeed will) be in IP. that is, all the semantic
>>conditions should be 'read off' from the graph via the surrogates.
>>
>>I will make this change.
>
>Clear.  This was the problem here.
>Now the first condition on rdf-interpretations holds.
>
>>
>>>===
>>>
>>>It should be made explicit what the domain and range of the
>>>function sur are: I assume that these sets are both IR.
>>
>>The domain is IR and the range is a subset of IR consisting of
>>vocabulary items. So the range is a subset of IR.
>>
>>>When this assumption is made explicit, there seems to be a
>>>circularity in the definition of LV for the RDFS Herbrand
>>>interpretation:
>>>the definition of LV depends on sur, the definition of sur
>>>depends on IR, the definition of IR depends on LV.
>>>In view of this circularity, the definition of LV becomes
>>>incomprehensible.  I believe that the definition of LV should
>>be made explicit.
>>
>>OK, though I don't accept that it is ambiguous at present.
>>
>>>From the given definition, I would guess that the intention
>>>is that LV is the union of five sets:
>>>    strings
>>>    pairs of strings and language tags
>>
>>ie plain literals, yes...
>>
>>>    XML values of well-typed XML literals in D
>>>    {v in voc(D):  the triple  v type Literal  is in D }
>>>    {v in voc(D): v a typed, non-XML literal such that
>>>     b type Literal is in D, where b is the blank node allocated
>>>     to v by rule lg }
>>
>>I do not think that this way of phrasing it appropriate. The central
>  >intuition is that in a simple Herbrand interpretation, IR consists of
>>the vocabulary items (including bnodes) in the graph, and the
>>interpretation is simply read off the graph. Here we need to modify
>>this by treating some vocabulary items as surrogates for more the
>>special values required by the semantic conditions, and adding
>>required items (all plain literals) which may not be in the graph;
>>otherwise, the construction should mirror the simple Herbrand
>>construction.
>  >
>>I propose to rephrase the definition as follows, modeled on the
>>definition used in the RDF lemma:
>  >
>>-------
>>If lll is a well-formed XML literal, let xml(lll) be the XML value of
>>lll; and for each XML value xml(lll) of any well-formed XML literal
>>lll in D, let sur(xml(lll)) be the blank node allocated to lll by
>>rule lg; for any other literal lll in D, let sur(lll) be the blank
>>node allocated to lll by rule lg, and extend sur to be the identity
>>mapping on URI references and blank nodes in D. The domain of this
>  >mapping is the universe IR<SH>, defined below, and the range contains
>>only URI references and blank nodes which occur in D.
>(Actually the document augments this with the definition that
>sur is also the identity mapping on "other plain literals")
>I still believe that the text contains a circularity that makes the
>construction hard to understand:
>sur is defined with reference to the universe IR "*defined below*",
>then LV is defined in terms of sur, then IR is defined in terms
>of LV.
>It seems that your last paragraph I cite here contains the
>solution to the problem:
>Wouldn't it be clearer to
>- first define IR, as you indicated, in four parts,
>- then define sur: IR -> nodes(D) intersect (URIs u blankNodes)
>- then define LV = plainLiterals u {x in IR: sur(x) type Literal in D}
>- then IP = {x in IR: sur(x) type Property in D}
>etc.?

I have rewritten the text along these lines but without altering the 
actual table (except as noted below). I hope this is now sufficiently 
clear.

>See further below.
>
>>-------
>>
>>>===
>>>
>>>"Define B(x) as before, then clearly [SH+B] satisfies D ..."
>>>There seems to be a problem with this conclusion.
>>>Making this explicit, it seems that B:blank(D)->IR
>>>needs to be defined by
>>>B(v)=xml(l) if v is a blank node allocated to the well-formed
>>>XML literal l,
>>>B(v)=l if v is a blank node allocated to a typed, non-XML
>>>literal l,
>>>otherwise B(v)=v.
>>
>>The wording is careless, forgive me. The intention was that the
>>second case would include ALL other literals, ie non-well-typed XML,
>>other typed and plain, if v has been allocated to that literal. I
>>will spell this out more carefully:
>>-----
>>Define B(x) as follows: if x is a blank node allocated to a
>>well-formed XML literal lll in D then B(x) = xml(lll); if it is
>>allocated to any other literal lll in D then B(x)=lll; and otherwise
>>B(x)=x.
>>-----
>>
>>I am not sure if this point solves your next comment, because I am
>>not sure what the force of the comment is.
>
>This indeed seems to be the required definition of B.
>
>[...]
>
>I have skipped here a rather long sequence of earlier remarks by me
>and comments on those by you.  Much becomes clear.
>
>It seems that there remains only one problem in the proof of the
>rdfs entailment lemma, which can be clearly localized.
>
>The claim on which the proof is based, that the RDFS Herbrand
>interpretation of an RDF graph is an RDFS interpretation,
>still seems to be false.
>
>To give an example, let A be a URI reference and let G
>be the RDF graph consisting of just the two triples
>   type domain A
>   A type Class
>Let l be a plain literal.

OK.

>Then A in IC, type in IP, <type,A> in IE(domain) and
><l,Literal> in IEXT(type).
>The semantic condition on domain shows that
>l in CEXT(A), or <l,A> in IEXT(type)
>Therefore D should contain the triple
>   sur(l) type A
>Since sur(l)=l, D should contain the triple
>   l type A
>which is clearly nonsense.
>
>It seems that similar small counterexamples to the claim
>can be given with other vocabulary instead of domain:
>range, subClassOf, subPropertyOf.
>
>Moreover, the last part of the proof of the rdfs entailment
>lemma has a conclusion that cannot be justified:
>"IEXTI(p) contains <SH+A(s),SH+A(o)>
>i.e. D contains a triple
>   sur(SH+A(s)) p sur(SH+A(o))"
>The problem with this conclusion "i.e." is that p can be type
>and <SH+A(s),SH+A(o)> can be <l,Literal> or <l,Resource>
>for some plain literal l.

not in D, that is, I presume is your point.  Yes, point taken.

>The proof of semantic conditions for domain, range, subClassOf
>and subPropertyOf make a similar unjustified step.
>
>It seems that these problems can be solved by slightly
>generalizing the notion of simple interpretations and
>thereby also the other kinds of interpretations
>(rdf, rdfs, D).
>The assumption that LV contains all plain literals might
>be dropped.

Indeed. This does greatly simplify everything and, as you say, is not 
actually used anywhere. At the time it was written there seemed to be 
an expository value in considering plain literals to be 'outside' the 
vocabulary, but this small value, if any, is clearly outweighed by 
the formal awkwardness that results.  I will refrain from drawing the 
obvious moral, since it resounds to my disadvantage.

On reflection I have made the following changes:

1. A plain literal is considered a name (so vocabularies may contain 
plain literals).
2. The definition of simple interpretation of V requires that LV 
contain plain literals *in V*
3. The first two semantic conditions on plain literals in a simple 
interpretation refer to plain literals *in V*.
4. The definitions of LV in the three Herbrand constructions are 
modified appropriately to refer to plain literals in the appropriate 
graph (respectively in G, C and D)
5. As you suggest, the special case of IEXT<SH> is dropped.

I also noted a slip in the definition of the mapping B in the RDF 
lemma proof and corrected it:

Define a mapping B on blank nodes in C as follows: B(x)=xml(lll) if x 
is allocated to a well-formed XML literal lll, otherwise B(x)=x

>   It seems that the only thing that is essential
>to assume about plain literals is that they denote themselves.
>It seems that nowhere in the entire document the assumption
>is used that an arbitrary string or pair consisting of string
>and literal is actually contained in LV.  XMLLiterals and
>other typed literals are not given this priviliged treatment.
>The assumption does not seem match well with a clean and correct
>development of the Herbrand construction in the case of RDFS.

Agreed.

>I consider the RDFS entailment lemma to be proved when
>this change would be made to the definition, and when,
>moreover, the following naturally following adjustments
>are made to the construction of the RDFS Herbrand
>interpretation:
>- drop the literals outside the vocabulary of D
>from IR (this is the second part of the definition mentioned
>above, each of these literals is plain)
>- drop the plain literals from LV, the definition would
>become completely analogous to that of IP:
>   LV = {x in IR : sur(x) type Literal in D}
>- in the definition of IEXT drop the special case IEXT(type),
>which causes the horrible problems I just mentioned.
>

Done. So it is proved :-)

Pat

>  >Pat
>>
>>PS. the changes made described above are now visible in the copy on my
>website:
>>
>>http://www.ihmc.us/users/phayes/RDF_Semantics_LC2.5.html
>>
>>I have also added an explanatory paragraph just before the proof of
>>the RDF entailment lemma, and some explanatory prose in the proof of
>>the RDFS entailment lemma concerning the role of literal surrogates.
>>
>>Please let me know if this response adequately deals with the issues you
>raise.
>>
>>Pat
>>
>>>
>>>
>>>Herman ter Horst
>>
>>
>>--
>>---------------------------------------------------------------------
>>IHMC            (850)434 8903 or (650)494 3973   home
>>40 South Alcaniz St.            (850)202 4416   office
>>Pensacola                                               (850)202 4440 fax
>>FL 32501                                                (850)291 0667
>cell
>>phayes@ihmc.us       http://www.ihmc.us/users/phayes
>>
>>
>
>Herman


-- 
---------------------------------------------------------------------
IHMC	(850)434 8903 or (650)494 3973   home
40 South Alcaniz St.	(850)202 4416   office
Pensacola			(850)202 4440   fax
FL 32501			(850)291 0667    cell
phayes@ihmc.us       http://www.ihmc.us/users/phayes
Received on Wednesday, 12 November 2003 14:00:03 UTC