- From: Eric Prud'hommeaux <eric@w3.org>
- Date: Wed, 16 May 2012 16:21:29 -0400
- To: RDB2RDF WG <public-rdb2rdf-wg@w3.org>
- Cc: ashok malhotra <ashok.malhotra@oracle.com>
* ashok malhotra <ashok.malhotra@oracle.com> [2012-05-16 05:59-0700] > Eric: > The statement that the DM provides default behavior appears in the DM spec, so it needs to > be addressed there. Apologies, quite right. > I do not think there is disagreement with your points 1 to 3 but we need a succinct statement > that captures the situation. I have no real quarrel with the words Richard suggested except that > I want to say the same thing in fewer words. I understand your intent, but think that some extra words are useful. Below are the three proposals that I see on the table. I've laid them out side-by-side (just below) and sequentially with long lines (further below). Changes are embedded in >><<s. The first included paragraph of the R2RML Intro is the same for all proposals; Ivan's proposal appends some text to it, and I've included it for context. ---- Ivan ---- ---- Richard ---- ---- Ashok ---- =DM Intro= =DM Intro= =DM Intro= The Direct Mapping is intended >>This specification has a The Direct Mapping is intended to provide a default behavior companion, the R2RML mapping to provide a default behavior for R2RML: RDB to RDF Mapping language [R2RML], that allows for R2RML: RDB to RDF Mapping Language [R2RML] >>for tables the creation of customized Language [R2RML]>>₁<<. It can which have at least one unique mapping from relational data also be used to materialize key<<. It can also be used to to RDF. R2RML defines a RDF graphs or define virtual materialize RDF graphs or relaxed variant of the Direct graphs, which can be queried define virtual graphs, which Mapping intended as a default by SPARQL or traversed by an can be queried by SPARQL or mapping for further RDF graph API.]] traversed by an RDF graph customization.<< It can also API.]] be used to materialize RDF >>₁ Except in the case of graphs or define virtual tables or views without a graphs, which can be queried primary key. In this case, by SPARQL or traversed by an identical rows may be kept RDF graph API.]] distinct by the DM and collapsed into a single row by R2RML<< =R2RML Intro= =R2RML Intro= =R2RML Intro= This specification has a This specification has a This specification has a companion that defines a companion that defines a companion that defines a direct mapping from relational direct mapping from relational direct mapping from relational databases to RDF [DM]. In the databases to RDF [DM]. In the databases to RDF [DM]. In the direct mapping of a database, direct mapping of a database, direct mapping of a database, the structure of the resulting the structure of the resulting the structure of the resulting RDF graph directly reflects RDF graph directly reflects RDF graph directly reflects the structure of the database, the structure of the database, the structure of the database, the target RDF vocabulary the target RDF vocabulary the target RDF vocabulary directly reflects the names of directly reflects the names of directly reflects the names of database schema elements, and database schema elements, and database schema elements, and neither structure nor target neither structure nor target neither structure nor target vocabulary can be vocabulary can be vocabulary can be changed. With R2RML on the changed. With R2RML on the changed. With R2RML on the other hand, a mapping author other hand, a mapping author other hand, a mapping author can define highly customized can define highly customized can define highly customized views over the relational views over the relational views over the relational data. data. data. >>R2RML implementations are >>==4.4 Default Mapping== encouraged to provide a An R2RML processor MAY include default mapping equivalent to an *R2RML default mapping the Direct Mapping for tables generator*. This is a facility which have at least one unique that introspects the schema of key. For tables with no unique the input database and key and which have multiple generates a *default mapping identical rows, the output document* intended for further dataset produced by the customization by a mapping default mapping will be author. The R2RML mapping equivalent to the Direct expressed in the default Mapping over the unique rows mapping document SHOULD be in that table.<< such that its output is the Direct Graph [DM] =R2RML 6.1= corresponding to the input >>Because rr:IRI and database. rr:BlankNode subject labels are generated from column Duplicate row preservation: values, R2RML mappings do not For tables without a primary preserve repeated rows in SQL key, the Direct Graph requires databases.<< that a fresh blank node is created for each row. This ensures that duplicate rows in such tables are preserved. This requirement is relaxed for R2RML default mappings: They MAY re-use the same blank node for multiple duplicate rows. This behaviour does not preserve duplicate rows. Implementations that provide default mappings based on the Direct Graph MUST document whether they preserve duplicate rows or not.<< Again, in linear format with long lines: ---- Ivan ---- =DM Intro= The Direct Mapping is intended to provide a default behavior for R2RML: RDB to RDF Mapping Language [R2RML] >>for tables which have at least one unique key<<. It can also be used to materialize RDF graphs or define virtual graphs, which can be queried by SPARQL or traversed by an RDF graph API.]] =R2RML Intro= This specification has a companion that defines a direct mapping from relational databases to RDF [DM]. In the direct mapping of a database, the structure of the resulting RDF graph directly reflects the structure of the database, the target RDF vocabulary directly reflects the names of database schema elements, and neither structure nor target vocabulary can be changed. With R2RML on the other hand, a mapping author can define highly customized views over the relational data. >>R2RML implementations are encouraged to provide a default mapping equivalent to the Direct Mapping for tables which have at least one unique key. For tables with no unique key and which have multiple identical rows, the output dataset produced by the default mapping will be equivalent to the Direct Mapping over the unique rows in that table.<< =R2RML 6.1= >>Because rr:IRI and rr:BlankNode subject labels are generated from column values, R2RML mappings do not preserve repeated rows in SQL databases.<< ---- Richard ---- =DM Intro= >>This specification has a companion, the R2RML mapping language [R2RML], that allows the creation of customized mapping from relational data to RDF. R2RML defines a relaxed variant of the Direct Mapping intended as a default mapping for further customization.<< It can also be used to materialize RDF graphs or define virtual graphs, which can be queried by SPARQL or traversed by an RDF graph API.]] =R2RML Intro= This specification has a companion that defines a direct mapping from relational databases to RDF [DM]. In the direct mapping of a database, the structure of the resulting RDF graph directly reflects the structure of the database, the target RDF vocabulary directly reflects the names of database schema elements, and neither structure nor target vocabulary can be changed. With R2RML on the other hand, a mapping author can define highly customized views over the relational data. >>==4.4 Default Mapping== An R2RML processor MAY include an *R2RML default mapping generator*. This is a facility that introspects the schema of the input database and generates a *default mapping document* intended for further customization by a mapping author. The R2RML mapping expressed in the default mapping document SHOULD be such that its output is the Direct Graph [DM] corresponding to the input database. Duplicate row preservation: For tables without a primary key, the Direct Graph requires that a fresh blank node is created for each row. This ensures that duplicate rows in such tables are preserved. This requirement is relaxed for R2RML default mappings: They MAY re-use the same blank node for multiple duplicate rows. This behaviour does not preserve duplicate rows. Implementations that provide default mappings based on the Direct Graph MUST document whether they preserve duplicate rows or not.<< ---- Ashok ---- =DM Intro= The Direct Mapping is intended to provide a default behavior for R2RML: RDB to RDF Mapping Language [R2RML]>>₁<<. It can also be used to materialize RDF graphs or define virtual graphs, which can be queried by SPARQL or traversed by an RDF graph API.]] >>₁ Except in the case of tables or views without a primary key. In this case, identical rows may be kept distinct by the DM and collapsed into a single row by R2RML<< =R2RML Intro= This specification has a companion that defines a direct mapping from relational databases to RDF [DM]. In the direct mapping of a database, the structure of the resulting RDF graph directly reflects the structure of the database, the target RDF vocabulary directly reflects the names of database schema elements, and neither structure nor target vocabulary can be changed. With R2RML on the other hand, a mapping author can define highly customized views over the relational data. > Feel free to suggest text. > All the best, Ashok > > On 5/15/2012 8:23 PM, Eric Prud'hommeaux wrote: > >* ashok malhotra<ashok.malhotra@oracle.com> [2012-05-15 15:19-0700] > >>I think we just need to fix the DM. If you disagree, please indicate what else needs to be said. > >But what exactly is broken in the DM? > > > >That's a somewhat glib question, but the point I made during the call today (which I thought actually caught some momentum) was this: > > 1 The DM is able to preserve cardinality over tables with potentially repeated rows. > > 2 R2RML is not able to preserve cardinality over tables with potentially repeated rows while staying within pure SQL (that is, you may be able to use e.g. rownums or assignable variables in different flavors of SQL, but in the SQL that we're targeting, the required behavior exceeds the expressivity of SQL). > > 3 For every situation where an R2RML processor would be unable to produce a DM as a default behavior (that is, those where the DM preserved cardinality and R2RML does not), the users need to be warned that, because they have potentially repeated rows in non-unique tables, the R2RML representation will lose some of the information in the database. > > 4 Future versions of R2RML will likely address this issue, making it enabling a generic R2RML processor to capture all of the information in repeated rows, and therefor able to use the DM for these cases. > > > >This points to following Ivan's proposal<http://www.w3.org/mid/FD9565BB-380D-474B-9453-60C7CAF6072E@w3.org> (add caveat text about when the DM is not the default or non-configured R2RML behavior). Adding text to the R2RML text in Ivan's proposal would help users understand the issues and the outcome. The current text is point 2 in Ivan's mail: > >[[ > >2. Add to the R2RML document (probably in the intro part): "R2RML implementations are encouraged to provide a default mapping equivalent to the Direct Mapping for tables which have at least one unique key" > >]] > >Adding this text would specify the behavior when there is no unique key: > >[[ > >For tables with no unique key and which have multiple identical rows, the output dataset produced by the default mapping will be equivalent to the Direct Mapping over the unique rows in that table. > >]] > > > >It's possible that we'll want to s/mapping will be equivalent/mapping MAY be equivalent/ because the simple mapping for SPARQL queries analogous conventional SQL queries, e.g. > > SELECT ?who ?owes { ?debt<IOUs#fname> ?who ;<IOUs#amount> ?owes } > >or > > SELECT ?fname (SUM(?owes) AS ?payupnow) { ?debt<IOUs#fname> ?fname ;<IOUs#amount> ?owes } GROUP BY ?fname > >would preserve cardinality unless one specifically invoked a subselect which grouped by all of the unique columns. (This consistency problem will arise R2RML regardless of whether DM is relaxed to potentially lose cardinality.) > > > > > >>The DM spec says: > >>[[The Direct Mapping is intended to provide a default behavior for R2RML: RDB to RDF Mapping Language<http://www.w3.org/TR/2012/CR-r2rml-20120223/> [R2RML]<http://www.w3.org/TR/rdb-direct-mapping/#R2RML>. It can also be used to materialize RDF graphs or define virtual graphs, which can be queried by SPARQL or traversed by an RDF graph API.]] > >> > >>Add an asterisk after the first sentence and a footnote. The footnote says: > >>[[Except in the case of tables or views without a primary key. In this case, identical rows may be kept distinct > >>by the DM and collapsed into a single row by R2RML]] > >> > >>R2RML says: > >>[[This specification has a companion that defines a direct mapping from relational databases to RDF<http://www.w3.org/TR/rdb-direct-mapping/> [DM<http://www.w3.org/TR/r2rml/#DM>]. In the direct mapping of a database, the structure of the resulting RDF graph directly reflects the structure of the database, the target RDF vocabulary directly reflects the names of database schema elements, and neither structure nor target vocabulary can be changed. With R2RML on the other hand, a mapping author can define highly customized views over the relational data.]] > >> > >>No change needed. > >>-- > >>All the best, Ashok -- -ericP
Received on Wednesday, 16 May 2012 20:22:06 UTC