Re: RDF Semantics: rdfs entailment lemma from pat hayes on 2003-11-06 (www-rdf-comments@w3.org from October to December 2003)

From: pat hayes <phayes@ihmc.us>
Date: Thu, 6 Nov 2003 15:57:30 -0600
To: herman.ter.horst@philips.com
Cc: www-rdf-comments@w3.org
Message-Id: <p06001f6abbd051c24f45@[10.1.31.1]>
>This is a review of part of the RDF Semantics
>document, editorial version LC2.5.
>In this message I mainly focus on the rdfs entailment lemma.
>
>The proof of this lemma is based on the claim that
>the RDFS Herbrand interpretation of an RDF graph is an
>RDFS interpretation.
>This claim seems to be false: the first condition for
>RDF interpretations is not satisfied.
>In order to show this, note that this condition amounts,
>in this case, to the equivalence
>    v in IP iff <v,Property> in IEXT(type).
>Suppose that the graph G has triples
>    v p l
>and
>    p range Property
>where l is a plain literal.
>By rule lg, the RDFS closure D of G contains the triple
>    v p b
>where b is allocated to l and where b = sur(l).
>By rule rdfs3, D contains the triple
>    b type Property = sur(l) type Property.
>Therefore, <l,Property> in IEXT(type).
>However, we cannot have  l in IP, since that would mean
>that D contains the triple
>    l type Property.

Yes. The SH definition of IP should refer to the surrogate, just as 
the IEXT definition does, ie should read:

IP<SH> = {x: D contains a triple sur(x) rdf:type rdf:Property . }

then l can (indeed will) be in IP. that is, all the semantic 
conditions should be 'read off' from the graph via the surrogates.

I will make this change.

>===
>
>It should be made explicit what the domain and range of the
>function sur are: I assume that these sets are both IR.

The domain is IR and the range is a subset of IR consisting of 
vocabulary items. So the range is a subset of IR.

>When this assumption is made explicit, there seems to be a
>circularity in the definition of LV for the RDFS Herbrand
>interpretation:
>the definition of LV depends on sur, the definition of sur
>depends on IR, the definition of IR depends on LV.
>In view of this circularity, the definition of LV becomes
>incomprehensible.  I believe that the definition of LV should
be made explicit.

OK, though I don't accept that it is ambiguous at present.

>From the given definition, I would guess that the intention
>is that LV is the union of five sets:
>   strings
>   pairs of strings and language tags

ie plain literals, yes...

>   XML values of well-typed XML literals in D
>   {v in voc(D):  the triple  v type Literal  is in D }
>   {v in voc(D): v a typed, non-XML literal such that
>    b type Literal is in D, where b is the blank node allocated
>    to v by rule lg }

I do not think that this way of phrasing it appropriate. The central 
intuition is that in a simple Herbrand interpretation, IR consists of 
the vocabulary items (including bnodes) in the graph, and the 
interpretation is simply read off the graph. Here we need to modify 
this by treating some vocabulary items as surrogates for more the 
special values required by the semantic conditions, and adding 
required items (all plain literals) which may not be in the graph; 
otherwise, the construction should mirror the simple Herbrand 
construction.

I propose to rephrase the definition as follows, modeled on the 
definition used in the RDF lemma:

-------
If lll is a well-formed XML literal, let xml(lll) be the XML value of 
lll; and for each XML value xml(lll) of any well-formed XML literal 
lll in D, let sur(xml(lll)) be the blank node allocated to lll by 
rule lg; for any other literal lll in D, let sur(lll) be the blank 
node allocated to lll by rule lg, and extend sur to be the identity 
mapping on URI references and blank nodes in D. The domain of this 
mapping is the universe IR<SH>, defined below, and the range contains 
only URI references and blank nodes which occur in D.
-------

>===
>
>"Define B(x) as before, then clearly [SH+B] satisfies D ..."
>There seems to be a problem with this conclusion.
>Making this explicit, it seems that B:blank(D)->IR
>needs to be defined by
>B(v)=xml(l) if v is a blank node allocated to the well-formed
>XML literal l,
>B(v)=l if v is a blank node allocated to a typed, non-XML
>literal l,
>otherwise B(v)=v.

The wording is careless, forgive me. The intention was that the 
second case would include ALL other literals, ie non-well-typed XML, 
other typed and plain, if v has been allocated to that literal. I 
will spell this out more carefully:
-----
Define B(x) as follows: if x is a blank node allocated to a 
well-formed XML literal lll in D then B(x) = xml(lll); if it is 
allocated to any other literal lll in D then B(x)=lll; and otherwise 
B(x)=x.
-----

I am not sure if this point solves your next comment, because I am 
not sure what the force of the comment is.

>(The second case is not exactly as before, but seems to be
>needed to develop a complete proof of the condition
>LV = ICEXT(Literal).)
>
>Given a triple vpw in D, rule rdf1 shows that D contains the
>triple
>   p type Property
>so that p in IP.  In order to prove that SH+B satisfies vpw,
>i.e. that <SH+B(v),SH+B(w)> in IE(p), it is sufficient to
>prove that D contains the triple
>  * sur(SH+B(v) p sur(SH+B(w)).
>Note that
>   sur(SH+B(v)) = v (when v in nodes(D) - literals)
>   sur(SH+B(v)) = b (when b is the blank node allocated to
>      v in nodes(D) intersection literals)
>(this can be checked for each of many different cases).
>So it can be concluded that D contains the triple * when
>lg can be applied in each step of the construction of D.
>However, rule lg is only used as the first step.

What is the problem? Once the surrogate blank node has been 
introduced, all the entailment rules apply to triples containing it 
whenever they would have applied to the similar triple containing the 
literal (and of course to some new triples which would have been 
illegal using the literal in subject position). So if any triple 
containing a literal is in D, then so is the similar triple 
containing its surrogate.  So, in effect, once the surrogate has been 
introduced and the rule applied in every possible way once (so as to 
reproduce the entire sub-graph of all triples which contain that 
literal, with the literal replaced by the surrogate) , the literals 
can in effect be ignored completely, and the closure can proceed 
using only blank nodes and URI references. Provided we then take care 
to then map allocated blank nodes back to their appropriate literal 
values, and all other blank nodes to themselves, everything works out 
fine.

Your point would  be well taken if the were rules which introduced 
new literals which did not occur in the original graph, but there are 
no such rules.

>It seems that this problem would be solved when rule lg can
>always be used in the construction of D
>
>===
>
>There seem to be problems with the proof of the condition
>IR = ICEXT(Resource).
>It only needs to be proved that if x is in IR, then
><x,Resource> in IE(type>, as the opposite is trivial.
>(Note that the document states the opposite.

Good point, I will change that.

>Note also that for the proof of LV=ICEXT(Literal), the
>document only states an if statement instead of an two
>statements.)

I will change that also. The case of interest, again, is the 'if' 
case, since the other case is trivial by construction.

>However, there are many cases.  The proof is not clear.

Can you say which parts are unclear? I believe that the table covers 
all the cases that are syntactically possible: URIs in subject, 
predicate and object position: bnodes in subject and object, and 
literals in object (2 cases).  I have rephrased the entires in the 
table slightly so as to make the connection with the notation used in 
the proof more evident.

>
>It seems that the proof uses and needs to use the triple
>** Literal subClassOf Resource,
>which however is not an axiomatic triple, to my surprise.
>Shouldn't this triple ** be made into an axiomatic triple?

It follows from the use of Literal as a range, so its being a Class, 
so its being a subClass of Resource by rdfs8.  The derivation is 
given in the table (first row, second sub-row)

>
>The last lines of the four proof parts consist of the
>triple
>    x type Resource
>If x is a URI this suffices to prove that
><x,Resource> in IE(type>, however when x is a blank node
>or a literal this is not sufficient.
>

It cannot be a literal, in subject position. Why is it not sufficient 
for a blank node? The argument is exactly similar to the standard 
case for a simple Herbrand interpretation: the surrogate for a blank 
node is itself. (Its potential role as itself being a surrogate for a 
literal is irrelevant at this point.)

Pat

PS. the changes made described above are now visible in the copy on my website:

http://www.ihmc.us/users/phayes/RDF_Semantics_LC2.5.html

I have also added an explanatory paragraph just before the proof of 
the RDF entailment lemma, and some explanatory prose in the proof of 
the RDFS entailment lemma concerning the role of literal surrogates.

Please let me know if this response adequately deals with the issues you raise.

Pat

>
>
>Herman ter Horst


-- 
---------------------------------------------------------------------
IHMC	(850)434 8903 or (650)494 3973   home
40 South Alcaniz St.	(850)202 4416   office
Pensacola			(850)202 4440   fax
FL 32501			(850)291 0667    cell
phayes@ihmc.us       http://www.ihmc.us/users/phayes
Received on Thursday, 6 November 2003 16:57:33 UTC