Re: RDF(S) tests and semantics from Dave Reynolds on 2003-08-06 (www-rdf-interest@w3.org from August 2003)

From: Dave Reynolds <der@hplb.hpl.hp.com>
Date: Wed, 06 Aug 2003 10:15:29 +0100
To: Steve Harris <S.W.Harris@ecs.soton.ac.uk>
CC: www-rdf-interest@w3.org
Message-ID: <3F30C731.CBB2C89E@hplb.hpl.hp.com>
Summary: I agree with your observations and have already fed something similar
back to the WG in:
  http://lists.w3.org/Archives/Public/www-rdf-comments/2003JulSep/0076.html

> According to
> http://www.w3.org/2000/10/rdf-tests/rdfcore/rdfms-seq-representation/Manifest.rdf#test002 the empty document should entail:
> 
> <rdf:_1> <rdf:type> <rdfs:ContainerMembershipProperty> .
> 
> But Section 4.2/2 of the RDF Semantics document
> (http://www.w3.org/TR/rdf-mt/) says:
> "Add all triples of the following forms [rdf:_* rdf:type CMP]. This is an
> infinite set because the RDF container vocabulary is infinite. However,
> since none of these triples entail any of the others, it is only
> necessary, in practice, to add the triples which use those container
> properties which actually occur in any particular graph or set of graphs                                            
> in order to check the rdfs-entailment relation between those graphs."
> 
> The MT document's description seems more sensible to me as otherwise there
> is no way to finitly resolve the query
> (?foo, <rdf:type>, <rdfs:ContainerMembershipProperty)

Agreed.  I believe test004 of that group is similar. 

My guess is that this arises because the tests are framed in terms of mutual
entailment between two graphs rather than query results over a graph. The
phrasing of the section you quote, "set of graphs .. to check .. between those
graphs", seems to suggest that if you are testing entailment between two graphs
you check *both* of them for container membership properties and run the
relevant comprehension rule over the union of the results. Thus, arguably, the
closure triples corresponding to both the premise and conclusions documents from
test002 should be added, which would make the test correct.

However, in practice, APIs tend to implement query operations rather than
entailment checks - because that is the operation that is useful. Thus a natural
way to interpret the WG test cases is to translate the conclusions document into
a query and test if that query returns a result when applied to the premise
document. In that case it is not reasonable to apply the comprehension rules to
the query, only to the premise document.

The WG test cases started life as illustrations of the consequences of working
group decisions, not conformance tests for software. If they were to be used as
software tests (e.g. as part of the evidence for moving to PR) then it would be
more helpful if tests like this one were reframed so that test-by-query was
valid, by moving any required comprehension axioms into the premise documents.

> I'm also concerned that its not possible to efficiently implement the new
> Literal and Datatype entailments, and they do not seem to be optional.
> Adding them would add another few million triples to our production KB and
> seems to produce confusing entailments. What is the intended meaning if you
> add
>         <foo> <bar> "10"
> which (I think) entails
>         <foo> <bar> _:something                   (by
>         _:something <rdf:type> <rdfs:Literal>      rdfD 1)
> 
> It seems to just produce unhelpful results if you query for
> (<foo>, <bar>, ?o)

Again, agreed, it is unhelpful.

By the way, this behaviour is needed to pass entailment test:
   xmlsch-02/Manifest.rdf#whitespace-facet-3 

> What is the scope of _:something, I gather its supposed to be the
> same ID for all literals that are byte-for-byte identical and have the
> same datatype, 

I don't think so - that would be sufficient but not necessary. Equivalent, but
non-identical bNodes, can be mapped onto each other by the interpolation lemma
so you should be free to introduce distinct bNodes for every literal. In any
case, they only *need* to be distinct for value-space distinct literals, not
byte-for-byte distinct.

> but what happens when you merge graphs with different
> nodes for _:something? How do you associate a given literal with the
> corresponging bNode? 

The literal is not "associated" with any of the bNodes, you can simply choose to
use the literal to substitute for any of the bNodes in doing the entailment test
via the instance lemma.

> Is it incorect for two identical literals to have
> different bNodes, or identical bNodes to have different literals and if
> so, how do you avoid this?

Again, the literals don't "have" the bNodes, they simply entail (a whole lot of)
graphs with bNodes in. The graph version with multiple bNodes and the graph with
a single bNode mutually entail each other.


For practical purposes I believe these datatype closure rules are best omitted.
If one were to implement a tool to perform RDF entailment checks one would need
to include them but in that case the tool would directly implement simple RDF
entailment (as captured in the interpolation lemma) and it all comes out in the
wash.

However, software systems like Jena and 3store are RDF access and query APIs
and, as such, already have direct ways of testing that a node is a literal. To
take your example above, if the application developer is saying "is there a
value for the <bar> property on <foo> which is also a Literal" then they can do
so directly through the query APIs without the need to actually generate these
closure bNodes and then retrieve them. In principle, it would be possible to
build an "entailment test to query" translator which handled all this. In
practice, I believe application developers start with the query anyway so all we
need is to ensure that the relevant queries can be expressed and executed.

Dave
Received on Wednesday, 6 August 2003 05:16:49 UTC