Re: Persistent blank node identifiers from Ted Thibodeau Jr on 2012-05-15 (public-rdb2rdf-wg@w3.org from May 2012)

From: Ted Thibodeau Jr <tthibodeau@openlinksw.com>
Date: Tue, 15 May 2012 18:16:17 -0400
To: Richard Cyganiak <richard@cyganiak.de>
Cc: David McNeil <dmcneil@revelytix.com>, RDB2RDF WG <public-rdb2rdf-wg@w3.org>
Message-Id: <70565904-8EDB-4BCD-9A0E-A233AC3251A6@openlinksw.com>
Richard --

On May 15, 2012, at 04:46 PM, Richard Cyganiak wrote:
> On 15 May 2012, at 21:24, David McNeil wrote:
>> On today's call we got into a discussion of whether R2RML blank node identifiers must be "persistent". That is if a client could expect the same blank node identifier to be generated for a given row across separate SPARQL queries.
>> 
>> While I expect in practice the blank node identifiers will be persistent. I think per the spec, the user cannot assume they are. I base this conclusion on this paragraph from section 11 of the R2RML draft:
>> 
>> "Conforming R2RML processors MAY rename blank nodes when providing access to the output dataset. This means that client applications may see actual blank node identifiers that differ from those produced by the R2RML mapping. Client applications SHOULD NOT rely on the specific text of the blank node identifier for any purpose."
>> 
>> To my reading, the last sentence precludes the user from assuming that blank node identifiers are persistent.
> 
> 1. It says SHOULD NOT. This doesn't preclude the assumption. Cf. RFC2119.

You're right; it doesn't preclude the assumption.  

But it waves a very large warning flag, and users acting on that 
assumption would seem not to be acting in their own best interest.


> 2. I read this differently. In the case of the Jena API, the Jena API implementation is part of the R2RML processor. The Jena API here is just another way of accessing the output dataset, just like SPARQL endpoints or RDF dumps or whatever other methods you can dream up. The client application in this case is the application that accesses the Jena API. The client app indeed SHOULD NOT rely on the blank node label. But the Jena API implementation (the R2RML processor) internally needs “persistent” blank node identifiers to fulfil the contract of the API.


It doesn't matter whether the user-facing element is Jena, SPARQL,
or a face-yet-to-be-drawn. Bnodes *are not* consistent across
multiple "exports" from RDB to RDF -- even if they *appear* to 
be so! -- and they should not be treated as if they were.

Jena may make the API contract you describe in context of a Gsnap 
(a result set, an RDF Graph, an dump of RDB data in RDF form).

Jena may *not* make this contract in context of a Gbox (an RDB 
schema, an RDF Graph container, an R2RML mapping of RDB schema
to RDF ontology).

Bnodes are scoped to RDF Graphs, to Gsnaps.  They are not scoped
to database queries, to Gboxes, etc.

Any time Jena goes back to the origin, and (re-)maps RDB instance
data to into RDF instance data, the Bnodes are different -- even 
if they look the same! -- because the *RDF Graph* to which they
are scoped is different.

Now, if Jena executes one query against the backend RDBMS, and
holds all the data it gets in cache (memory or otherwise), and
makes a contract about *that* with its clients -- that's fine!

The client gets a Bnode from one SPARQL query against this
"temporary RDF Graph", and wants to learn more?  Another query
against the *same* temporary RDF Graph is valid and may be 
valuable!

But Jena cannot legitimately retrieve *any* additional information 
from the back-end at this point -- the Bnode in Jena's hands has 
no relation, no connection to the data in the RDBMS -- no matter 
*how* the Bnode label's construction has been defined.

Simply put, the API contract you describe is unenforceable and 
invalid, and were it any other kind of contract, I'd suggest 
its signatories consult with an attorney.

Ted



> Do you think this is a reasonable reading? If so, any suggestions for rewording that make it clearer?
> 
> Best,
> Richard

--
A: Yes.                      http://www.guckes.net/faq/attribution.html
| Q: Are you sure?
| | A: Because it reverses the logical flow of conversation.
| | | Q: Why is top posting frowned upon?

Ted Thibodeau, Jr.           //               voice +1-781-273-0900 x32
Senior Support & Evangelism  //        mailto:tthibodeau@openlinksw.com
                             //              http://twitter.com/TallTed
OpenLink Software, Inc.      //              http://www.openlinksw.com/
         10 Burlington Mall Road, Suite 265, Burlington MA 01803
     Weblog   -- http://www.openlinksw.com/blogs/
     LinkedIn -- http://www.linkedin.com/company/openlink-software/
     Twitter  -- http://twitter.com/OpenLink
     Google+  -- http://plus.google.com/100570109519069333827/
     Facebook -- http://www.facebook.com/OpenLinkSoftware
Universal Data Access, Integration, and Management Technology Providers
Attachments

application/pkcs7-signature attachment: smime.p7s
Received on Tuesday, 15 May 2012 22:16:48 UTC