Re: Minimalist proposal to resolve no-primary-key issue

* ashok malhotra <ashok.malhotra@oracle.com> [2012-05-15 15:19-0700]
> I think we just need to fix the DM.  If you disagree, please indicate what else needs to be said.

But what exactly is broken in the DM?

That's a somewhat glib question, but the point I made during the call today (which I thought actually caught some momentum) was this:
  1 The DM is able to preserve cardinality over tables with potentially repeated rows.
  2 R2RML is not able to preserve cardinality over tables with potentially repeated rows while staying within pure SQL (that is, you may be able to use e.g. rownums or assignable variables in different flavors of SQL, but in the SQL that we're targeting, the required behavior exceeds the expressivity of SQL).
  3 For every situation where an R2RML processor would be unable to produce a DM as a default behavior (that is, those where the DM preserved cardinality and R2RML does not), the users need to be warned that, because they have potentially repeated rows in non-unique tables, the R2RML representation will lose some of the information in the database.
  4 Future versions of R2RML will likely address this issue, making it enabling a generic R2RML processor to capture all of the information in repeated rows, and therefor able to use the DM for these cases.

This points to following Ivan's proposal <http://www.w3.org/mid/FD9565BB-380D-474B-9453-60C7CAF6072E@w3.org> (add caveat text about when the DM is not the default or non-configured R2RML behavior). Adding text to the R2RML text in Ivan's proposal would help users understand the issues and the outcome. The current text is point 2 in Ivan's mail:
[[
2. Add to the R2RML document (probably in the intro part): "R2RML implementations are encouraged to provide a default mapping equivalent to the Direct Mapping for tables which have at least one unique key"
]]
Adding this text would specify the behavior when there is no unique key:
[[
For tables with no unique key and which have multiple identical rows, the output dataset produced by the default mapping will be equivalent to the Direct Mapping over the unique rows in that table.
]]

It's possible that we'll want to s/mapping will be equivalent/mapping MAY be equivalent/ because the simple mapping for SPARQL queries analogous conventional SQL queries, e.g.
  SELECT ?who ?owes { ?debt <IOUs#fname> ?who ; <IOUs#amount> ?owes }
or
  SELECT ?fname (SUM(?owes) AS ?payupnow) { ?debt <IOUs#fname> ?fname ; <IOUs#amount> ?owes } GROUP BY ?fname
would preserve cardinality unless one specifically invoked a subselect which grouped by all of the unique columns. (This consistency problem will arise R2RML regardless of whether DM is relaxed to potentially lose cardinality.)


> The DM spec says:
> [[The Direct Mapping is intended to provide a default behavior for R2RML: RDB to RDF Mapping Language <http://www.w3.org/TR/2012/CR-r2rml-20120223/> [R2RML] <http://www.w3.org/TR/rdb-direct-mapping/#R2RML>. It can also be used to materialize RDF graphs or define virtual graphs, which can be queried by SPARQL or traversed by an RDF graph API.]]
> 
> Add an asterisk after the first sentence and a footnote.  The footnote says:
> [[Except in the case of tables or views without a primary key.  In this case, identical rows may be kept distinct
> by the DM and collapsed into a single row by R2RML]]
> 
> R2RML says:
> [[This specification has a companion that defines a direct mapping from relational databases to RDF <http://www.w3.org/TR/rdb-direct-mapping/> [DM <http://www.w3.org/TR/r2rml/#DM>]. In the direct mapping of a database, the structure of the resulting RDF graph directly reflects the structure of the database, the target RDF vocabulary directly reflects the names of database schema elements, and neither structure nor target vocabulary can be changed. With R2RML on the other hand, a mapping author can define highly customized views over the relational data.]]
> 
> No change needed.
> -- 
> All the best, Ashok

-- 
-ericP

Received on Wednesday, 16 May 2012 03:24:28 UTC