Re: Proposal for “per-row blank node maps” in R2RML from David McNeil on 2012-05-14 (public-rdb2rdf-wg@w3.org from May 2012)

From: David McNeil <dmcneil@revelytix.com>
Date: Mon, 14 May 2012 11:44:08 -0500
To: ashok.malhotra@oracle.com
Cc: RDB2RDF WG <public-rdb2rdf-wg@w3.org>
Message-ID: <CA+8VvdxtahGwNfVeY2cw+u0ZAJ_R-KM+J6Y0L=kv0VDdxTnb0w@mail.gmail.com>
On Thu, May 10, 2012 at 5:33 PM, ashok malhotra
<ashok.malhotra@oracle.com>wrote:

> I have seen no mail about this so let me ask if we can all agree to
> Richard's proposal below


I don't think we should make this change to R2RML.

The premise behind the effort to add this to R2RML is that the Direct
Mapping cannot be implemented on R2RML without changing R2RML. I think that
premise is incorrect. I think the Direct Mapping of tables without keys can
be implemented on R2RML by using R2RML Views.

At the risk of over-simplifying I think this is the situation:

* for valid reasons the Direct Mapping wants to preserve the cardinality of
duplicate rows in tables without keys
* for the case of query translation (i.e. treating the RDF as virtual
triples), there is not a standard SQL way of solving this problem. But, for
most databases there is are reasonable vendor-specific SQL queries to
accomplish this.
* so a Direct Mapping for a given table in a given vendor's database can
produce an R2RML mapping that includes a vendor-specific R2RML view to
accomplish the blank node generation that preserves the cardinality.
* there may be some database somewhere for which this is
impossible/impractical. That is ok.

So I propose that we leave the Direct Mapping and the R2RML spec as-is with
respect to duplicate rows in tables without keys.

We at Revelytix discussed the current proposal and our specific
comments/questions are inline below.

-David

====

• If the term type is rr:RowBlankNode, then you don't specify
> rr:column/rr:template/rr:constant, and you get a fresh blank node for each
> row. (That's the new part.)
>

I think the proposed spec changes need clarification defining within what
context the same fresh blank node is valid. For example, what if the same
TriplesMap is employed multiple times in a given SPARQL query?

Conforming R2RML processors MAY treat R2RML mappings that use per-row blank
> node maps over R2RML views as an error.
>

This makes me nervous because it seems to be strongly hinting at the
underlying implementation details. However, without understanding the
implementation details this restriction would not make sense to users. I
think this indicates that the idea of a RowBlankNode is a leaky abstraction.

I would need to think about this more, but it seems that we can only
support the RowBlankNode feature in cases where a single table is being
queried? I am not sure I have my head around the way that other R2RML
features can be combined that would create a context in which a table is
being joined and thus a row id is not available.

It is possible to define multiple per-row blank node maps over a single
> logical table. In this case, each of the maps produce distinct blank nodes.
> In the following example, two unique blank nodes are generated for each
> logical table row, one as the subject and one of the object of the
> generated ex:p triples.
>
>    <#map1>  a rr:TriplesMap;
>        rr:logicalTable<#**someLogicalTable>;
>        rr:subjectMap [ rr:termType rr:RowBlankNode; ];
>        rr:predicateObjectMap [
>            rr:property ex:p;
>            rr:objectMap [ rr:termType rr:RowBlankNode; ];
>        ];
>        .
>

I struggle to think of a motivating use-case for generating two unique
blank nodes for a given logical row. This seems to me like we are trying to
add too much to R2RML.


> But in the following example, each generated triple will have the same
> blank node as subject and object, because the same per-row blank node map
> is reference as the subject map and object map.
>
>    <#map1>  a rr:TriplesMap;
>        rr:logicalTable<#**someLogicalTable>;
>        rr:subjectMap<#blankNodes>;
>        rr:predicateObjectMap [
>            rr:property ex:p;
>            rr:objectMap<#blankNodes>;
>        ];
>        .
>    <#blankNodes>  rr:termType rr:RowBlankNode.
> ]]
>

This seems quite esoteric to me. I submit that the two mappings above would
be considered equivalent by most users, but the proposal is to bury subtly
different behavior in these. I think that is a bad idea.

Section 9.1 of the R2RML draft says:

"If the same blank node
identifier<http://www.w3.org/2001/sw/rdb2rdf/r2rml/#dfn-blank-node-identifier>occurs
in multiple RDF
triples <http://www.w3.org/2001/sw/rdb2rdf/r2rml/#dfn-rdf-triple> that are
in the same graph, then the triples will share the same blank node"

This was addressing how two separate blank node term maps could produce
references to the same blank node. Does it make sense to talk about doing
the same thing with this new RowBlankNode feature?
Received on Monday, 14 May 2012 16:44:41 UTC