Re: R2RML Views for unique blank nodes

Hi David,

It's a very complicated design space.

There are many possible approaches to generating these “artificial row IDs”. Generally the approaches are not portable, and generally they require full table scans and hence are unacceptably slow on large tables.

Eric has pointed out that you only need to generate an “artificial row ID” if the asked SPARQL query (or other kind of query) actually requires returning the row blank nodes. In other cases, you can actually get correct results with fairly simple SQL, and Eric argues that this is the common case.

If you use an R2RML view to generate the “artificial row ID”, then the R2RML implementation cannot optimize the case where the row ID doesn't need to be returned. We have to hope that the database's own SQL optimizer can unravel the query, otherwise we'll get terrible performance.

If it's up to the R2RML implementation to generate the “artificial row ID” only if required, then it's much clearer that we can achieve good performance at least for *some* queries. Users can be advised to avoid returning row blank nodes in their SPARQL queries. (My experience is that users are generally willing to rewrite their mappings or their SPARQL queries if this makes things fast enough. If things can't be made fast enough, they will ditch the product. This isn't too surprising.)

I'm also not confident that implementing the DM using R2RML views is possible on all DBs. (As usual, I have the worst doubts about MySQL. The DM *can* be implemented if one can do things like creating a temporary view, but that's again something that isn't expressible in R2RML.)

As I said, it's a complicated design space that hasn't been sufficiently explored.

The non-cardinality-preserving variant of the DM is much simpler and it's well-known how to implement SPARQL over it.

Best,
Richard


On 8 May 2012, at 19:14, David McNeil wrote:

> One of the key issues from today's working group discussion was whether the Direct Mapping of a table with duplicate rows could be represented in R2RML. The observation was made that it cannot be represented in R2RML. Is this still the case if R2RML Views are considered? It seems to me that in many cases a Direct Mapping of a table with duplicate rows could be represented in R2RML with an R2RML View. I think that is a reasonable way to handle the issue in the case of a custom mapping, perhaps it could be used for the Direct Mapping as well? I suppose this might not be acceptable to some if it relied on vendor specific SQL.
> 
> -David

Received on Tuesday, 8 May 2012 20:41:03 UTC