- From: Seaborne, Andy <andy.seaborne@hp.com>
- Date: Fri, 05 Oct 2007 15:11:47 +0100
- To: ogbujic@ccf.org
- CC: Lee Feigenbaum <lee@thefigtrees.net>, 'RDF Data Access Working Group' <public-rdf-dawg@w3.org>
Chimezie Ogbuji wrote: > On Thu, 2007-10-04 at 13:30 -0400, Lee Feigenbaum wrote: >> To test Glitter+Anzo, I read all the data graphs into our quad store. >> When the graph is queried in this type of query, the blank node is the >> same whether it's treated as the default graph or a named graph, and so >> I get results for these queries (and thus fail the tests). > > Hmm.. So, sounds as if (from the text below) depending on how you > construct a dataset specified in the body of the query (either at the > point of query by an actual dereference of the graph IRs at each point > where they are used or before hand) you could end up with two answers > for the same query against the same data. This would especially be the > case if the representation is RDF/XML - where the blank node identifiers > are scoped to the document (representation). > > "The FROM and FROM NAMED keywords allow a query to specify an RDF > dataset by reference; they indicate that the dataset should include > graphs that are obtained from representations of the resources > identified by the given IRIs (i.e. the absolute form of the given IRI > references)." > > This seems like significantly counter-intuitive behavior and there is no > health-warning to this effect (unless I'm mistaken). I'm sure this > simply follows from the general issues with matching non-lean RDF > graphs, but I can imagine that (for 'disconnected' agents) this corner > case would occur quite frequently unless the agent adhered to a thorough > caching regiment (few web agents do). I'd imagine most would simply > dereference <g.ttl> twice (resulting in *two* isomorphic graphs with > distinct blank nodes). It's not related to leaning. The issue arises in a dataset description if the same URI is mentioned twice. The two cases are FROM <g> FROM <g> and FROM <g> FROM NAMED <g> How that declaration gets turned into data for the query is not a SPARQL issue as it will depend on the environment - two possible designs are "read on each mention" or "read once, use twice". Aside: something recent about bnodes and labels: http://lists.w3.org/Archives/Public/public-owl-dev/2007OctDec/0049.html Of course, if read twice, you might not even get the same triples aside from any bNode issues because the graph changes between GET requests. A really clever processor could even read twice, know they are the same graph (how, I don't know - maybe by looking at HTTP header information?) and smush the bNodes together. Andy > >> I've talked this over with Andy and Eric, and we all agree that nothing >> in the spec mandates that the default graph need be different from all >> the named graphs that make up the RDF Dataset. > > Yes. > >> In order to still maintain our test coverage (testing that bnode ids are >> not shared between different dataset graphs), I'm going to propose on >> Tuesday that we remove the above tests and replace them with >> >> dataset/manifest#dawg-dataset-09b >> dataset/manifest#dawg-dataset-10b >> dataset/manifest#dawg-dataset-12b >> graph/manifest#dawg-graph-10b >> >> These are identical tests except that the graphs in the default graph >> and the named graph part of the dataset have different URIs. > > Yes, I agree. My only concern is that even if we don't have an explicit > test that exposes this counter-intuitive behavior there (perhaps) should > be some kind of indication so it is not *completely* out left field. > Even if we don't add an explicit warning, perhaps leave a test case > which exercises this anomaly, without any WG approval? > -- Hewlett-Packard Limited Registered Office: Cain Road, Bracknell, Berks RG12 1HN Registered No: 690597 England
Received on Friday, 5 October 2007 14:12:11 UTC