Re: Fixing an omission in R2RML: syntax of blank node labels

Richard,

why is that long note on the various syntaxes necessary here? That looks to me as an implementation dependent detail that is not for the Rec.

Ivan


On Apr 26, 2012, at 03:56 , Richard Cyganiak wrote:

> The test case reviews have highlighted an oversight in R2RML.
> 
> A change to the semantics will be necessary to fix this. It's essentially just a simple bugfix, although it requires a somewhat lengthy informative explanation.
> 
> Section 11.2 of R2RML defines the “term generation rules”, and have the following to say about generating blank nodes:
> 
> [[
> If the term type is rr:BlankNode: Return a blank node whose blank node identifier is the natural RDF lexical form corresponding to value.
> ]]
> http://www.w3.org/TR/2012/CR-r2rml-20120223/#generated-rdf-term
> 
> There is a problem though. “Value” at this point could be an arbitrary SQL value containing any characters. Blank node identifiers however are syntactically restricted in the various syntaxes. Worse, the restrictions are different in different syntaxes. So, the spec as written asks implementations that might generate illegal blank node identifiers.
> 
> The fix involves lots of handwaving because due to the different syntaxes I don't think it's practical to specify a single escaping scheme that works everywhere. Since blank node labels are not semantically meaningful, we can leave the choice of escaping scheme up to the implementations. But this requires some explaining. So I'd like to change the phrasing above to:
> 
> [[
> If the term type is rr:BlankNode: Return a blank node generated by applying the implementation-dependent blank node labelling function to the natural RDF lexical form corresponding to value.
> ]]
> 
> “blank node labelling function” would then be defined like this, including a very long NOTE:
> 
> [[
> The blank node labelling function is an arbitrary implementation-dependent function whose inputs are strings, and whose outputs are blank nodes. The function MUST be bijective, that is, the inputs and outputs are in a 1:1 correspondence.
> 
> NOTE: In the various syntaxes and access interfaces for RDF, blank nodes are generally represented by a blank node identifier. The precise syntax and allowed characters for blank node identifiers differ between syntaxes and interfaces. An R2RML processor must have the ability to generate valid blank node identifiers from arbitrary input strings. This is the task of the blank node labelling function. R2RML processors may have to use different blank node labelling functions for different output syntaxes or access interfaces.
> 
> A string matching the regular expression [a-zA-Z_](([a-zA-Z_0-9-])*[a-zA-Z_0-9.-])? is a valid blank node identifier in Turtle, SPARQL, N-Triples and RDF/XML. The following algorithm is a simple blank node labelling function that produces such valid blank node identifiers (but not very readable ones) from any input string:
> 
> 1. Turn the input string into a byte sequence by UTF-8 encoding.
> 2. Turn each byte into a two-digit hexadecimal number.
> 3. Concatenate all digits into a string, prepend “blank”, and generate a blank node with this blank node identifier.
> 
> For example, the string “:-)” would yield a blank node identifier “blank3A2D29”.
> ]]
> 


----
Ivan Herman, W3C Semantic Web Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
FOAF: http://www.ivan-herman.net/foaf.rdf

Received on Thursday, 26 April 2012 09:17:44 UTC