Re: Persistent blank node identifiers

David,

Again, this may be overcome by events as the rr:RowBlankNode idea might be off the table, but…

On 15 May 2012, at 23:46, David McNeil wrote:
>> On 15 May 2012, at 21:24, David McNeil wrote:
>> > On today's call we got into a discussion of whether R2RML blank node identifiers must be "persistent". That is if a client could expect the same blank node identifier to be generated for a given row across separate SPARQL queries.
>> >
>> > While I expect in practice the blank node identifiers will be persistent. I think per the spec, the user cannot assume they are. I base this conclusion on this paragraph from section 11 of the R2RML draft:
>> >
>> > "Conforming R2RML processors MAY rename blank nodes when providing access to the output dataset. This means that client applications may see actual blank node identifiers that differ from those produced by the R2RML mapping. Client applications SHOULD NOT rely on the specific text of the blank node identifier for any purpose."
>> >
>> > To my reading, the last sentence precludes the user from assuming that blank node identifiers are persistent.
>> 
>> 1. It says SHOULD NOT. This doesn't preclude the assumption. Cf. RFC2119.
> 
> Ermm... let me try again using different words. My reading of this paragraph and in particular the last sentence is that R2RML implementors are free to produce blank node identifiers which are not persistent. For (a contrived) example, an implementor could decide to put the current date/time stamp in the blank node identifiers. Then each query would produce different identifiers. This would be perfectly valid under the R2RML spec (according to my reading). From the user's perspective, if they want to avoid being wrong they will not make assume blank node identifiers are persistent.

RFC 2119 says, slightly paraphrased: “SHOULD NOT means that there may exist valid reasons in particular circumstances when the particular behavior is acceptable or even useful, but the full implications should be understood and the case carefully weighed before implementing any behavior described with this label.”

One example where the particular behavior (relying on the blank node label) is acceptable and useful might be if the R2RML processor explicitly documents what labels it produces. Another example might be if the API through which I interact with the R2RML processor explicitly requires that blank node labels be persistent (otherwise the API implementation is broken).

>> 2. I read this differently. In the case of the Jena API, the Jena API implementation is part of the R2RML processor. The Jena API here is just another way of accessing the output dataset, just like SPARQL endpoints or RDF dumps or whatever other methods you can dream up. The client application in this case is the application that accesses the Jena API. The client app indeed SHOULD NOT rely on the blank node label. But the Jena API implementation (the R2RML processor) internally needs “persistent” blank node identifiers to fulfil the contract of the API.
>> 
>> Do you think this is a reasonable reading? If so, any suggestions for rewording that make it clearer?
> 
> To my reading, if an implementor wants to expose the triples via Jena then they can do that. Furthermore they can make the blank node identifiers persistent. However, if you take an implementation of the Jena API and point it at an off-the-shelf R2RML implementation then that Jena API implementation becomes the client and SHOULD NOT assume that the blank node identifiers are persistent.

That is correct.

> If we want to support this then I think we would need to add words saying that an R2RML implementation MUST (or SHOULD?) consistently produce the same blank node identifier for a given row.

I don't want to support this.

It's ok with me if the spec says that implementations can rename blank nodes arbitrarily. Because that means there's nothing wrong with an implementation that does *not* rename them. My larger point is that we intend to provide such an implementation, and thus it's important to us that such an implementation is *possible*, even though we don't want the spec to *mandate* persistent blank node identifiers.

Best,
Richard

Received on Thursday, 17 May 2012 20:26:32 UTC