- From: Eric Prud'hommeaux <eric@w3.org>
- Date: Thu, 17 May 2012 16:58:00 -0400
- To: Richard Cyganiak <richard@cyganiak.de>
- Cc: RDB2RDF WG <public-rdb2rdf-wg@w3.org>, ashok malhotra <ashok.malhotra@oracle.com>
just to be clear, i have every confidence that we are working towards the same design, and that you'll document it well. i am, however, happy to tool on the wording. * Richard Cyganiak <richard@cyganiak.de> [2012-05-17 20:52+0100] > Eric, > > Comments inline. > > On 17 May 2012, at 04:53, Eric Prud'hommeaux wrote: > > I think I favor the explicitness of Richard's with a couple textual proposals below: > > > > > >> ---- Ivan ---- ---- Richard ---- ---- Ashok ---- > > Three-column side-by-side text? O_o > > >> =DM Intro= =DM Intro= =DM Intro= > >> The Direct Mapping is intended >>This specification has a The Direct Mapping is intended > >> to provide a default behavior companion, the R2RML mapping to provide a default behavior > >> for R2RML: RDB to RDF Mapping language [R2RML], that allows for R2RML: RDB to RDF Mapping > >> Language [R2RML] >>for tables the creation of customized Language [R2RML]>>₁<<. It can > >> which have at least one unique mapping from relational data also be used to materialize > >> key<<. It can also be used to to RDF. R2RML defines a RDF graphs or define virtual > >> materialize RDF graphs or relaxed variant of the Direct graphs, which can be queried > >> define virtual graphs, which Mapping intended as a default by SPARQL or traversed by an > >> can be queried by SPARQL or mapping for further RDF graph API. > >> traversed by an RDF graph customization.<< It can also > >> API. be used to materialize RDF >>₁ Except in the case of > >> graphs or define virtual tables or views without a > >> graphs, which can be queried primary key. In this case, > >> by SPARQL or traversed by an identical rows may be kept > >> RDF graph API. distinct by the DM and > >> collapsed into a single row > >> by R2RML<< > > > > Like Ashok, I was tempted to be explicit about what a "relaxed variant" is. As it turns out, it's identical to the DM over the unique rows. > > I think it might be a bit awkward so I'm tempted to use Ricarhd's wording directly, > > This is just the introduction; the purpose is just to give a brief account of how the two specs relate. The imprecise phrase “relaxed variant” should be a link directly to the new section of R2RML, so anyone who wonders what it means just needs to click. works for me > > but if folks think it's worth the extra noise, here's what I wrote: > > [[ > > s/R2RML defines a relaxed variant of the Direct Mapping intended as a default mapping for further customization. > > /R2RML uses the Direct Mapping as a default mapping for further customization. For tables with no unique keys, R2RML implementations may use the Direct Mapping over only the unique rows in tables with no unique key. > > / > > Yeah, this would be ok too, although it seems to much detail for the introduction. let's leave it out of DM. > > The other minor mod is s/It can also/The Direct Mapping can also/ 'cause the antecedent has gotten stale by the time you get there. > > The intention in my proposal was to move the sentence starting “It can also” before the sentence(s) that explains the R2RML relationship. Either way is ok. > > >> are generated from column Duplicate row preservation: > >> values, R2RML mappings do not For tables without a primary > >> preserve repeated rows in SQL key, the Direct Graph requires > >> databases.<< that a fresh blank node is > >> created for each row. This > >> ensures that duplicate rows in > >> such tables are > >> preserved. This requirement is > >> relaxed for R2RML default > >> mappings: They MAY re-use the > >> same blank node for multiple > >> duplicate rows. This behaviour > >> does not preserve duplicate > >> rows. Implementations that > >> provide default mappings based > >> on the Direct Graph MUST > >> document whether they preserve > >> duplicate rows or not.<< > > > > In order to make users life easier, let's add that they must be consistent about using the DM or the uniquified variant: > > > > s/Graph MUST document whether > > /Graph MUST be consistent about whether or not duplicate rows are exposed in the output dataset, and document whether > > / > > This seems imprecise. It says that *implementations* must be consistent. The language should make clear which of these is allowed: > > • One implementation that supports multiple different DB engines, and generates a preserving default mapping for Oracle and a non-preserving for MySQL > > • An implementation that has a switch where the user can choose the behaviour when invoking the default mapping generator > > • An implementation that generates a preserving default mapping if and only if it knows that is has write access to the DB > > • An implementation that generates a default mapping that preserves duplicate rows only in the unlikely case that a unique key (but no primary key) is present > > • An implementation that generates a default mapping that preserves duplicate rows over base tables, but not over views > > I think one could argue that all of these are reasonable and should be allowed, as long as it's properly documented and users know what's going on. But regardless, making the phrasing sufficiently precise to discriminate between these cases may make it too complicated to be worth it. How about just ruling out an implementation which preserves cardinality for some operations but treats the table as a set for others? For example, an implementation which provides a non-materialized view of a non-unique table mustn't treat the table as unique when answering queries with variable predicates ("SELECT ?s ?p ?o WHERE { ?s ?p ?o FILTER regex(?p, '^http://foo.example/db/IOUs/') }") but preserve cardinality when answering queries with fixed predicates ("SELECT ?who ?amount WHERE { ?x IOUs:fname ?who ; IOUs:owes ?amount }"). Any idea how to say that? > > (This is a forward ref to output dataset, ugh.) > > It wouldn't be the first one in the R2RML spec :-( One could make this Section 12 instead of 4.4 to avoid the forward ref, but I'm not sure that's better in the end. > > Best, > Richard > -- -ericP
Received on Thursday, 17 May 2012 20:58:33 UTC