Re: test change proposal: same graph in default and named graph parts of RDF dataset

Chimezie Ogbuji wrote:
> On Thu, 2007-10-04 at 13:30 -0400, Lee Feigenbaum wrote:
>> To test Glitter+Anzo, I read all the data graphs into our quad store.
>> When the graph is queried in this type of query, the blank node is the
>> same whether it's treated as the default graph or a named graph, and so
>> I get results for these queries (and thus fail the tests).
> 
> Hmm.. So, sounds as if (from the text below) depending on how you
> construct a dataset specified in the body of the query (either at the
> point of query by an actual dereference of the graph IRs at each point
> where they are used or before hand) you could end up with two answers
> for the same query against the same data.  This would especially be the
> case if the representation is RDF/XML - where the blank node identifiers
> are scoped to the document (representation). 
> 
> "The FROM and FROM NAMED keywords allow a query to specify an RDF
> dataset by reference; they indicate that the dataset should include
> graphs that are obtained from representations of the resources
> identified by the given IRIs (i.e. the absolute form of the given IRI
> references)."
> 
> This seems like significantly counter-intuitive behavior and there is no
> health-warning to this effect (unless I'm mistaken).  I'm sure this
> simply follows from the general issues with matching non-lean RDF
> graphs, but I can imagine that (for 'disconnected' agents) this corner
> case would occur quite frequently unless the agent adhered to a thorough
> caching regiment (few web agents do).  I'd imagine most would simply
> dereference <g.ttl> twice (resulting in *two* isomorphic graphs with
> distinct blank nodes).

It's not related to leaning. The issue arises in a dataset description if the 
same URI is mentioned twice.  The two cases are

FROM <g>
FROM <g>
   and
FROM <g>
FROM NAMED <g>

How that declaration gets turned into data for the query is not a SPARQL issue 
as it will depend on the environment - two possible designs are "read on each 
mention" or "read once, use twice".

Aside: something recent about bnodes and labels:
http://lists.w3.org/Archives/Public/public-owl-dev/2007OctDec/0049.html

Of course, if read twice, you might not even get the same triples aside from 
any bNode issues because the graph changes between GET requests.

A really clever processor could even read twice, know they are the same graph 
(how, I don't know - maybe by looking at HTTP header information?) and smush 
the bNodes together.

 Andy

> 
>> I've talked this over with Andy and Eric, and we all agree that nothing 
>> in the spec mandates that the default graph need be different from all 
>> the named graphs that make up the RDF Dataset.
> 
> Yes.
> 
>> In order to still maintain our test coverage (testing that bnode ids are 
>> not shared between different dataset graphs), I'm going to propose on 
>> Tuesday that we remove the above tests and replace them with
>>
>> dataset/manifest#dawg-dataset-09b
>> dataset/manifest#dawg-dataset-10b
>> dataset/manifest#dawg-dataset-12b
>> graph/manifest#dawg-graph-10b
>>
>> These are identical tests except that the graphs in the default graph 
>> and the named graph part of the dataset have different URIs.
> 
> Yes, I agree.  My only concern is that even if we don't have an explicit
> test that exposes this counter-intuitive behavior there (perhaps) should
> be some kind of indication so it is not *completely* out left field.
> Even if we don't add an explicit warning, perhaps leave a test case
> which exercises this anomaly, without any WG approval?
> 

-- 
Hewlett-Packard Limited
Registered Office: Cain Road, Bracknell, Berks RG12 1HN
Registered No: 690597 England

Received on Friday, 5 October 2007 14:12:11 UTC