- From: Richard Cyganiak <richard@cyganiak.de>
- Date: Fri, 18 May 2012 11:57:18 +0100
- To: ashok.malhotra@oracle.com
- Cc: public-rdb2rdf-wg@w3.org
Ashok, On 18 May 2012, at 00:18, ashok malhotra wrote: > Since we seem to be converging on your proposal could you send mail with the suggested words. > Eric's 3 column format is cool but I cannot cut and paste from it. We're still discussing this proposal here: http://lists.w3.org/Archives/Public/public-rdb2rdf-wg/2012May/0084.html We're considering some possible tweaks to this wording. > We are dealing with a corner case so we should not give too much importance to it with a large > number of words. "Brevity" as Polonius says in Hamlet "is the soul of wit". If the relationship between DM and R2RML was simple and obvious, then we wouldn't need to say much about it. I would have much preferred a simple and obvious relationship between them. Unfortunately the situation is *not* simple and obvious. So I'm afraid that we need to spell this out: 1. the notion of a default mapping for R2RML 2. point out that it's a good idea to use the DM as a default mapping 3. point out that it's acceptable to use a slightly altered version of the DM as an R2RML default mapping Pointing out these things in the R2RML spec means that Eric can have his unaltered cardinality-preserving DM, and I can implement a simple non-cardinality-preserving R2RML default mapping generator, and we both can claim conformance to something, and our stuff will be interoperable in all cases except for the corner case you mention. The proposal above does all of this in two paragraphs and I don't think that's too much text. Best, Richard > All the best, Ashok > > On 5/17/2012 1:58 PM, Eric Prud'hommeaux wrote: >> just to be clear, i have every confidence that we are working towards the same design, and that you'll document it well. i am, however, happy to tool on the wording. >> >> >> * Richard Cyganiak<richard@cyganiak.de> [2012-05-17 20:52+0100] >>> Eric, >>> >>> Comments inline. >>> >>> On 17 May 2012, at 04:53, Eric Prud'hommeaux wrote: >>>> I think I favor the explicitness of Richard's with a couple textual proposals below: >>>> >>>> >>>>> ---- Ivan ---- ---- Richard ---- ---- Ashok ---- >>> Three-column side-by-side text? O_o >>> >>>>> =DM Intro= =DM Intro= =DM Intro= >>>>> The Direct Mapping is intended>>This specification has a The Direct Mapping is intended >>>>> to provide a default behavior companion, the R2RML mapping to provide a default behavior >>>>> for R2RML: RDB to RDF Mapping language [R2RML], that allows for R2RML: RDB to RDF Mapping >>>>> Language [R2RML]>>for tables the creation of customized Language [R2RML]>>₁<<. It can >>>>> which have at least one unique mapping from relational data also be used to materialize >>>>> key<<. It can also be used to to RDF. R2RML defines a RDF graphs or define virtual >>>>> materialize RDF graphs or relaxed variant of the Direct graphs, which can be queried >>>>> define virtual graphs, which Mapping intended as a default by SPARQL or traversed by an >>>>> can be queried by SPARQL or mapping for further RDF graph API. >>>>> traversed by an RDF graph customization.<< It can also >>>>> API. be used to materialize RDF>>₁ Except in the case of >>>>> graphs or define virtual tables or views without a >>>>> graphs, which can be queried primary key. In this case, >>>>> by SPARQL or traversed by an identical rows may be kept >>>>> RDF graph API. distinct by the DM and >>>>> collapsed into a single row >>>>> by R2RML<< >>>> Like Ashok, I was tempted to be explicit about what a "relaxed variant" is. As it turns out, it's identical to the DM over the unique rows. >>>> I think it might be a bit awkward so I'm tempted to use Ricarhd's wording directly, >>> This is just the introduction; the purpose is just to give a brief account of how the two specs relate. The imprecise phrase “relaxed variant” should be a link directly to the new section of R2RML, so anyone who wonders what it means just needs to click. >> works for me >> >>>> but if folks think it's worth the extra noise, here's what I wrote: >>>> [[ >>>> s/R2RML defines a relaxed variant of the Direct Mapping intended as a default mapping for further customization. >>>> /R2RML uses the Direct Mapping as a default mapping for further customization. For tables with no unique keys, R2RML implementations may use the Direct Mapping over only the unique rows in tables with no unique key. >>>> / >>> Yeah, this would be ok too, although it seems to much detail for the introduction. >> let's leave it out of DM. >> >>>> The other minor mod is s/It can also/The Direct Mapping can also/ 'cause the antecedent has gotten stale by the time you get there. >>> The intention in my proposal was to move the sentence starting “It can also” before the sentence(s) that explains the R2RML relationship. Either way is ok. >>> >>>>> are generated from column Duplicate row preservation: >>>>> values, R2RML mappings do not For tables without a primary >>>>> preserve repeated rows in SQL key, the Direct Graph requires >>>>> databases.<< that a fresh blank node is >>>>> created for each row. This >>>>> ensures that duplicate rows in >>>>> such tables are >>>>> preserved. This requirement is >>>>> relaxed for R2RML default >>>>> mappings: They MAY re-use the >>>>> same blank node for multiple >>>>> duplicate rows. This behaviour >>>>> does not preserve duplicate >>>>> rows. Implementations that >>>>> provide default mappings based >>>>> on the Direct Graph MUST >>>>> document whether they preserve >>>>> duplicate rows or not.<< >>>> In order to make users life easier, let's add that they must be consistent about using the DM or the uniquified variant: >>>> >>>> s/Graph MUST document whether >>>> /Graph MUST be consistent about whether or not duplicate rows are exposed in the output dataset, and document whether >>>> / >>> This seems imprecise. It says that *implementations* must be consistent. The language should make clear which of these is allowed: >>> >>> • One implementation that supports multiple different DB engines, and generates a preserving default mapping for Oracle and a non-preserving for MySQL >>> >>> • An implementation that has a switch where the user can choose the behaviour when invoking the default mapping generator >>> >>> • An implementation that generates a preserving default mapping if and only if it knows that is has write access to the DB >>> >>> • An implementation that generates a default mapping that preserves duplicate rows only in the unlikely case that a unique key (but no primary key) is present >>> >>> • An implementation that generates a default mapping that preserves duplicate rows over base tables, but not over views >>> >>> I think one could argue that all of these are reasonable and should be allowed, as long as it's properly documented and users know what's going on. But regardless, making the phrasing sufficiently precise to discriminate between these cases may make it too complicated to be worth it. >> How about just ruling out an implementation which preserves cardinality for some operations but treats the table as a set for others? For example, an implementation which provides a non-materialized view of a non-unique table mustn't treat the table as unique when answering queries with variable predicates ("SELECT ?s ?p ?o WHERE { ?s ?p ?o FILTER regex(?p, '^http://foo.example/db/IOUs/') }") but preserve cardinality when answering queries with fixed predicates ("SELECT ?who ?amount WHERE { ?x IOUs:fname ?who ; IOUs:owes ?amount }"). >> >> Any idea how to say that? >> >>>> (This is a forward ref to output dataset, ugh.) >>> It wouldn't be the first one in the R2RML spec :-( One could make this Section 12 instead of 4.4 to avoid the forward ref, but I'm not sure that's better in the end. >>> >>> Best, >>> Richard >>> >
Received on Friday, 18 May 2012 10:57:50 UTC