Re: Proposal for “per-row blank node maps” in R2RML from Richard Cyganiak on 2012-05-17 (public-rdb2rdf-wg@w3.org from May 2012)

From: Richard Cyganiak <richard@cyganiak.de>
Date: Thu, 17 May 2012 21:16:40 +0100
To: David McNeil <dmcneil@revelytix.com>
Cc: ashok.malhotra@oracle.com, RDB2RDF WG <public-rdb2rdf-wg@w3.org>
Message-Id: <C1AFC125-20F0-4B20-85E1-50FD2A792B89@cyganiak.de>
David,

It looks like this proposal (rr:RowBlankNode) is off the table anyways, but I wanted to reply to a few points anyways.

On 14 May 2012, at 17:44, David McNeil wrote:
>> • If the term type is rr:RowBlankNode, then you don't specify rr:column/rr:template/rr:constant, and you get a fresh blank node for each row. (That's the new part.)
> 
> I think the proposed spec changes need clarification defining within what context the same fresh blank node is valid. For example, what if the same TriplesMap is employed multiple times in a given SPARQL query?

R2RML is formally defined as a mapping to an RDF graph (well, RDF dataset to be precise). So, formally, a SPARQL query doesn't employ TriplesMaps. It just may or may not hit any given triple in the output RDF graph. Thus the only question is: How exactly does the mapping from rows to blank nodes work? Are there any circumstances where the output graph can contain multiple blank nodes that were mapped from the same row? Once that question is answered, you know the shape of the RDF graph. The rest is just vanilla SPARQL semantics.

>> Conforming R2RML processors MAY treat R2RML mappings that use per-row blank node maps over R2RML views as an error.
> 
> This makes me nervous because it seems to be strongly hinting at the underlying implementation details. However, without understanding the implementation details this restriction would not make sense to users. I think this indicates that the idea of a RowBlankNode is a leaky abstraction.

The restriction is there because it's all that's needed for the problem at hand — representing the DM in R2RML. This was pointed out by Souri. I don't see any particular technical reason for restricting it in this way.

(All abstractions are leaky. That in itself is no reason not to do something.)

> I would need to think about this more, but it seems that we can only support the RowBlankNode feature in cases where a single table is being queried? I am not sure I have my head around the way that other R2RML features can be combined that would create a context in which a table is being joined and thus a row id is not available.

Even in the DM if your database has only tables without PKs you can still join between them in a single query, by joining on literals:

SELECT * WHERE {
  ?x <TABLE1#NAME> ?name .
  ?y <TABLE2#NAME> ?name .
}

>> It is possible to define multiple per-row blank node maps over a single logical table. In this case, each of the maps produce distinct blank nodes. In the following example, two unique blank nodes are generated for each logical table row, one as the subject and one of the object of the generated ex:p triples.
>> 
>>    <#map1>  a rr:TriplesMap;
>>        rr:logicalTable<#someLogicalTable>;
>>        rr:subjectMap [ rr:termType rr:RowBlankNode; ];
>>        rr:predicateObjectMap [
>>            rr:property ex:p;
>>            rr:objectMap [ rr:termType rr:RowBlankNode; ];
>>        ];
>>        .
> 
> I struggle to think of a motivating use-case for generating two unique blank nodes for a given logical row.

The use case is tables that are denormalized with respect to the target ontology. This is somewhat orthogonal to the DM issue, but can be genuinely useful. Let's say we have a table CUSTOMER with columns:

   CUST_ID, CUST_NAME, ADDRESS, ZIP CITY, COUNTRY

But in the target RDF we want to create *two* resources from this table: an ex:Customer resource and an ex:Address resource. If we want a blank node for the address, then a fresh rr:RowBlankNode would be handy. I find that mappings where I want to create multiple of such auxilliary blank nodes from a row are not uncommon.

> This seems to me like we are trying to add too much to R2RML.
>  
>> But in the following example, each generated triple will have the same blank node as subject and object, because the same per-row blank node map is reference as the subject map and object map.
>> 
>>    <#map1>  a rr:TriplesMap;
>>        rr:logicalTable<#someLogicalTable>;
>>        rr:subjectMap<#blankNodes>;
>>        rr:predicateObjectMap [
>>            rr:property ex:p;
>>            rr:objectMap<#blankNodes>;
>>        ];
>>        .
>>    <#blankNodes>  rr:termType rr:RowBlankNode.
>> ]]
> 
> This seems quite esoteric to me. I submit that the two mappings above would be considered equivalent by most users, but the proposal is to bury subtly different behavior in these.

That's why the spec points out the difference. I think this is a purely didactic issue. The difference can be motivated via examples and so on.

> I think that is a bad idea.
> 
> Section 9.1 of the R2RML draft says:
> 
> "If the same blank node identifier occurs in multiple RDF triples that are in the same graph, then the triples will share the same blank node"
> 
> This was addressing how two separate blank node term maps could produce references to the same blank node. Does it make sense to talk about doing the same thing with this new RowBlankNode feature?

I guess this would say: Two blank nodes that are both generated by rr:RowBlankNode are the same if and only if:

  1. they were generated by the same rr:TermMap instance
  2. they were generated from the same unique table row
  3. they are in the same graph (named or default)

The second point would require some additional clarification I suppose: We assume an arbitrary but stable ordering that assigns separate identity to the multiple indistinguishable duplicate rows.

Best,
Richard
Received on Thursday, 17 May 2012 20:17:12 UTC