Re: comparison of no-functional-change proposes for no-primary-key issue from Richard Cyganiak on 2012-05-18 (public-rdb2rdf-wg@w3.org from May 2012)

From: Richard Cyganiak <richard@cyganiak.de>
Date: Fri, 18 May 2012 11:57:18 +0100
To: ashok.malhotra@oracle.com
Cc: public-rdb2rdf-wg@w3.org
Message-Id: <31648BF3-4921-4292-BC2B-3B1B70A6ABCF@cyganiak.de>
Ashok,

On 18 May 2012, at 00:18, ashok malhotra wrote:
> Since we seem to be converging on your proposal could you send mail with the suggested words.
> Eric's 3 column format is cool but I cannot cut and paste from it.

We're still discussing this proposal here:
http://lists.w3.org/Archives/Public/public-rdb2rdf-wg/2012May/0084.html

We're considering some possible tweaks to this wording.

> We are dealing with a corner case so we should not give too much importance to it with a large
> number of words.  "Brevity" as Polonius says in Hamlet "is the soul of wit".

If the relationship between DM and R2RML was simple and obvious, then we wouldn't need to say much about it. I would have much preferred a simple and obvious relationship between them. Unfortunately the situation is *not* simple and obvious.

So I'm afraid that we need to spell this out:

1. the notion of a default mapping for R2RML
2. point out that it's a good idea to use the DM as a default mapping
3. point out that it's acceptable to use a slightly altered version of the DM as an R2RML default mapping

Pointing out these things in the R2RML spec means that Eric can have his unaltered cardinality-preserving DM, and I can implement a simple non-cardinality-preserving R2RML default mapping generator, and we both can claim conformance to something, and our stuff will be interoperable in all cases except for the corner case you mention.

The proposal above does all of this in two paragraphs and I don't think that's too much text.

Best,
Richard



> All the best, Ashok
> 
> On 5/17/2012 1:58 PM, Eric Prud'hommeaux wrote:
>> just to be clear, i have every confidence that we are working towards the same design, and that you'll document it well. i am, however, happy to tool on the wording.
>> 
>> 
>> * Richard Cyganiak<richard@cyganiak.de>  [2012-05-17 20:52+0100]
>>> Eric,
>>> 
>>> Comments inline.
>>> 
>>> On 17 May 2012, at 04:53, Eric Prud'hommeaux wrote:
>>>> I think I favor the explicitness of Richard's with a couple textual proposals below:
>>>> 
>>>> 
>>>>> ---- Ivan ----                      ---- Richard ----                  ---- Ashok ----
>>> Three-column side-by-side text? O_o
>>> 
>>>>> =DM Intro=                        =DM Intro=                         =DM Intro=
>>>>> The Direct Mapping is intended>>This specification has a         The Direct Mapping is intended
>>>>> to provide a default behavior     companion, the R2RML mapping       to provide a default behavior
>>>>> for R2RML: RDB to RDF Mapping     language [R2RML], that allows      for R2RML: RDB to RDF Mapping
>>>>> Language [R2RML]>>for tables     the creation of customized         Language [R2RML]>>₁<<. It can
>>>>> which have at least one unique    mapping from relational data       also be used to materialize
>>>>> key<<. It can also be used to     to RDF. R2RML defines a            RDF graphs or define virtual
>>>>> materialize RDF graphs or         relaxed variant of the Direct      graphs, which can be queried
>>>>> define virtual graphs, which      Mapping intended as a default      by SPARQL or traversed by an
>>>>> can be queried by SPARQL or       mapping for further                RDF graph API.
>>>>> traversed by an RDF graph         customization.<<  It can also
>>>>> API.                              be used to materialize RDF>>₁ Except in the case of
>>>>>                                   graphs or define virtual           tables or views without a
>>>>>                                   graphs, which can be queried       primary key.  In this case,
>>>>>                                   by SPARQL or traversed by an       identical rows may be kept
>>>>>                                   RDF graph API.                     distinct by the DM and
>>>>>                                                                      collapsed into a single row
>>>>>                                                                      by R2RML<<
>>>> Like Ashok, I was tempted to be explicit about what a "relaxed variant" is. As it turns out, it's identical to the DM over the unique rows.
>>>> I think it might be a bit awkward so I'm tempted to use Ricarhd's wording directly,
>>> This is just the introduction; the purpose is just to give a brief account of how the two specs relate. The imprecise phrase “relaxed variant” should be a link directly to the new section of R2RML, so anyone who wonders what it means just needs to click.
>> works for me
>> 
>>>> but if folks think it's worth the extra noise, here's what I wrote:
>>>> [[
>>>> s/R2RML defines a relaxed variant of the Direct Mapping intended as a default mapping for further customization.
>>>> /R2RML uses the Direct Mapping as a default mapping for further customization. For tables with no unique keys, R2RML implementations may use the Direct Mapping over only the unique rows in tables with no unique key.
>>>> /
>>> Yeah, this would be ok too, although it seems to much detail for the introduction.
>> let's leave it out of DM.
>> 
>>>> The other minor mod is s/It can also/The Direct Mapping can also/ 'cause the antecedent has gotten stale by the time you get there.
>>> The intention in my proposal was to move the sentence starting “It can also” before the sentence(s) that explains the R2RML relationship. Either way is ok.
>>> 
>>>>> are generated from column         Duplicate row preservation:
>>>>> values, R2RML mappings do not     For tables without a primary
>>>>> preserve repeated rows in SQL     key, the Direct Graph requires
>>>>> databases.<<                       that a fresh blank node is
>>>>>                                  created for each row. This
>>>>>                                  ensures that duplicate rows in
>>>>>                                  such tables are
>>>>>                                  preserved. This requirement is
>>>>>                                  relaxed for R2RML default
>>>>>                                  mappings: They MAY re-use the
>>>>>                                  same blank node for multiple
>>>>>                                  duplicate rows. This behaviour
>>>>>                                  does not preserve duplicate
>>>>>                                  rows. Implementations that
>>>>>                                  provide default mappings based
>>>>>                                  on the Direct Graph MUST
>>>>>                                  document whether they preserve
>>>>>                                  duplicate rows or not.<<
>>>> In order to make users life easier, let's add that they must be consistent about using the DM or the uniquified variant:
>>>> 
>>>> s/Graph MUST document whether
>>>> /Graph MUST be consistent about whether or not duplicate rows are exposed in the output dataset, and document whether
>>>> /
>>> This seems imprecise. It says that *implementations* must be consistent. The language should make clear which of these is allowed:
>>> 
>>> • One implementation that supports multiple different DB engines, and generates a preserving default mapping for Oracle and a non-preserving for MySQL
>>> 
>>> • An implementation that has a switch where the user can choose the behaviour when invoking the default mapping generator
>>> 
>>> • An implementation that generates a preserving default mapping if and only if it knows that is has write access to the DB
>>> 
>>> • An implementation that generates a default mapping that preserves duplicate rows only in the unlikely case that a unique key (but no primary key) is present
>>> 
>>> • An implementation that generates a default mapping that preserves duplicate rows over base tables, but not over views
>>> 
>>> I think one could argue that all of these are reasonable and should be allowed, as long as it's properly documented and users know what's going on. But regardless, making the phrasing sufficiently precise to discriminate between these cases may make it too complicated to be worth it.
>> How about just ruling out an implementation which preserves cardinality for some operations but treats the table as a set for others? For example, an implementation which provides a non-materialized view of a non-unique table mustn't treat the table as unique when answering queries with variable predicates ("SELECT ?s ?p ?o WHERE { ?s ?p ?o FILTER regex(?p, '^http://foo.example/db/IOUs/') }") but preserve cardinality when answering queries with fixed predicates ("SELECT ?who ?amount WHERE { ?x IOUs:fname ?who ; IOUs:owes ?amount }").
>> 
>> Any idea how to say that?
>> 
>>>> (This is a forward ref to output dataset, ugh.)
>>> It wouldn't be the first one in the R2RML spec :-( One could make this Section 12 instead of 4.4 to avoid the forward ref, but I'm not sure that's better in the end.
>>> 
>>> Best,
>>> Richard
>>> 
>
Received on Friday, 18 May 2012 10:57:50 UTC