Re: test change proposal: same graph in default and named graph parts of RDF dataset from Chimezie Ogbuji on 2007-10-04 (public-rdf-dawg@w3.org from October to December 2007)

From: Chimezie Ogbuji <ogbujic@ccf.org>
Date: Thu, 04 Oct 2007 14:47:38 -0400
To: "Lee Feigenbaum" <lee@thefigtrees.net>
cc: "'RDF Data Access Working Group'" <public-rdf-dawg@w3.org>
Message-ID: <1191523658.23042.26.camel@otherland>

On Thu, 2007-10-04 at 13:30 -0400, Lee Feigenbaum wrote:
> To test Glitter+Anzo, I read all the data graphs into our quad store.
> When the graph is queried in this type of query, the blank node is the
> same whether it's treated as the default graph or a named graph, and so
> I get results for these queries (and thus fail the tests).

Hmm.. So, sounds as if (from the text below) depending on how you
construct a dataset specified in the body of the query (either at the
point of query by an actual dereference of the graph IRs at each point
where they are used or before hand) you could end up with two answers
for the same query against the same data.  This would especially be the
case if the representation is RDF/XML - where the blank node identifiers
are scoped to the document (representation). 

"The FROM and FROM NAMED keywords allow a query to specify an RDF
dataset by reference; they indicate that the dataset should include
graphs that are obtained from representations of the resources
identified by the given IRIs (i.e. the absolute form of the given IRI
references)."

This seems like significantly counter-intuitive behavior and there is no
health-warning to this effect (unless I'm mistaken).  I'm sure this
simply follows from the general issues with matching non-lean RDF
graphs, but I can imagine that (for 'disconnected' agents) this corner
case would occur quite frequently unless the agent adhered to a thorough
caching regiment (few web agents do).  I'd imagine most would simply
dereference <g.ttl> twice (resulting in *two* isomorphic graphs with
distinct blank nodes).

> I've talked this over with Andy and Eric, and we all agree that nothing 
> in the spec mandates that the default graph need be different from all 
> the named graphs that make up the RDF Dataset.

Yes.

> In order to still maintain our test coverage (testing that bnode ids are 
> not shared between different dataset graphs), I'm going to propose on 
> Tuesday that we remove the above tests and replace them with
> 
> dataset/manifest#dawg-dataset-09b
> dataset/manifest#dawg-dataset-10b
> dataset/manifest#dawg-dataset-12b
> graph/manifest#dawg-graph-10b
> 
> These are identical tests except that the graphs in the default graph 
> and the named graph part of the dataset have different URIs.

Yes, I agree.  My only concern is that even if we don't have an explicit
test that exposes this counter-intuitive behavior there (perhaps) should
be some kind of indication so it is not *completely* out left field.
Even if we don't add an explicit warning, perhaps leave a test case
which exercises this anomaly, without any WG approval?

-- 
Chimezie Ogbuji
Lead Systems Analyst
Thoracic and Cardiovascular Surgery
Cleveland Clinic Foundation
9500 Euclid Avenue/ W26
Cleveland, Ohio 44195
Office: (216)444-8593
ogbujic@ccf.org

===================================

Cleveland Clinic is ranked one of the top hospitals
in America by U.S. News & World Report (2007).  
Visit us online at http://www.clevelandclinic.org for
a complete listing of our services, staff and
locations.

Confidentiality Note:  This message is intended for use
only by the individual or entity to which it is addressed
and may contain information that is privileged,
confidential, and exempt from disclosure under applicable
law.  If the reader of this message is not the intended
recipient or the employee or agent responsible for
delivering the message to the intended recipient, you are
hereby notified that any dissemination, distribution or
copying of this communication is strictly prohibited.  If
you have received this communication in error,  please
contact the sender immediately and destroy the material in
its entirety, whether electronic or hard copy.  Thank you.

Received on Thursday, 4 October 2007 18:48:10 UTC