Re: Persistent blank node identifiers

On Tue, May 15, 2012 at 3:46 PM, Richard Cyganiak <richard@cyganiak.de>wrote:

> Hi David,
>
> On 15 May 2012, at 21:24, David McNeil wrote:
> > On today's call we got into a discussion of whether R2RML blank node
> identifiers must be "persistent". That is if a client could expect the same
> blank node identifier to be generated for a given row across separate
> SPARQL queries.
> >
> > While I expect in practice the blank node identifiers will be
> persistent. I think per the spec, the user cannot assume they are. I base
> this conclusion on this paragraph from section 11 of the R2RML draft:
> >
> > "Conforming R2RML processors MAY rename blank nodes when providing
> access to the output dataset. This means that client applications may see
> actual blank node identifiers that differ from those produced by the R2RML
> mapping. Client applications SHOULD NOT rely on the specific text of the
> blank node identifier for any purpose."
> >
> > To my reading, the last sentence precludes the user from assuming that
> blank node identifiers are persistent.
>
> 1. It says SHOULD NOT. This doesn't preclude the assumption. Cf. RFC2119.
>
>
Ermm... let me try again using different words. My reading of this
paragraph and in particular the last sentence is that R2RML implementors
are free to produce blank node identifiers which are not persistent. For (a
contrived) example, an implementor could decide to put the current
date/time stamp in the blank node identifiers. Then each query would
produce different identifiers. This would be perfectly valid under the
R2RML spec (according to my reading). From the user's perspective, if they
want to avoid being wrong they will not make assume blank node identifiers
are persistent.


> 2. I read this differently. In the case of the Jena API, the Jena API
> implementation is part of the R2RML processor. The Jena API here is just
> another way of accessing the output dataset, just like SPARQL endpoints or
> RDF dumps or whatever other methods you can dream up. The client
> application in this case is the application that accesses the Jena API. The
> client app indeed SHOULD NOT rely on the blank node label. But the Jena API
> implementation (the R2RML processor) internally needs “persistent” blank
> node identifiers to fulfil the contract of the API.
>
> Do you think this is a reasonable reading? If so, any suggestions for
> rewording that make it clearer?
>

To my reading, if an implementor wants to expose the triples via Jena then
they can do that. Furthermore they can make the blank node identifiers
persistent. However, if you take an implementation of the Jena API and
point it at an off-the-shelf R2RML implementation then that Jena API
implementation becomes the client and SHOULD NOT assume that the blank node
identifiers are persistent.

If we want to support this then I think we would need to add words saying
that an R2RML implementation MUST (or SHOULD?) consistently produce the
same blank node identifier for a given row.

-David

Received on Tuesday, 15 May 2012 22:47:08 UTC