Re: comparison of no-functional-change proposes for no-primary-key issue from Richard Cyganiak on 2012-05-18 (public-rdb2rdf-wg@w3.org from May 2012)

From: Richard Cyganiak <richard@cyganiak.de>
Date: Fri, 18 May 2012 11:44:26 +0100
To: Eric Prud'hommeaux <eric@w3.org>
Cc: RDB2RDF WG <public-rdb2rdf-wg@w3.org>, ashok malhotra <ashok.malhotra@oracle.com>
Message-Id: <AEFE15DF-13DD-4151-A147-AADD0E29DE24@cyganiak.de>
On 17 May 2012, at 21:58, Eric Prud'hommeaux wrote:
> just to be clear, i have every confidence that we are working towards the same design, and that you'll document it well. i am, however, happy to tool on the wording.

+1

>>> In order to make users life easier, let's add that they must be consistent about using the DM or the uniquified variant:
>>> 
>>> s/Graph MUST document whether
>>> /Graph MUST be consistent about whether or not duplicate rows are exposed in the output dataset, and document whether
>>> /
>> 
>> This seems imprecise. It says that *implementations* must be consistent. The language should make clear which of these is allowed:
>> 
>> • One implementation that supports multiple different DB engines, and generates a preserving default mapping for Oracle and a non-preserving for MySQL
>> 
>> • An implementation that has a switch where the user can choose the behaviour when invoking the default mapping generator
>> 
>> • An implementation that generates a preserving default mapping if and only if it knows that is has write access to the DB
>> 
>> • An implementation that generates a default mapping that preserves duplicate rows only in the unlikely case that a unique key (but no primary key) is present
>> 
>> • An implementation that generates a default mapping that preserves duplicate rows over base tables, but not over views
>> 
>> I think one could argue that all of these are reasonable and should be allowed, as long as it's properly documented and users know what's going on. But regardless, making the phrasing sufficiently precise to discriminate between these cases may make it too complicated to be worth it.
> 
> How about just ruling out an implementation which preserves cardinality for some operations but treats the table as a set for others? For example, an implementation which provides a non-materialized view of a non-unique table mustn't treat the table as unique when answering queries with variable predicates ("SELECT ?s ?p ?o WHERE { ?s ?p ?o FILTER regex(?p, '^http://foo.example/db/IOUs/') }") but preserve cardinality when answering queries with fixed predicates ("SELECT ?who ?amount WHERE { ?x IOUs:fname ?who ; IOUs:owes ?amount }").

I believe that the current wording already rules out such implementations.

The way R2RML works formally is by saying which triples exist in the output RDF graph (or output RDF dataset). The results of a SPARQL query over the output RDF graph are then trivially given by SPARQL semantics. The results of dumping the RDF graph to N-Triples is trivially given by the definition of N-Triples. And so on.

So, if an implementation provides SPARQL access to the virtual output of an R2RML mapping, then the SPARQL results must be the same as if we materialized the output of the R2RML mapping and ran SPARQL queries over that. R2RML spells out unambiguously how to materialize. SPARQL spells out unambiguously how to run queries over that.

Putting it another way: The decision whether to preserve cardinality or not is captured in the generated default mapping. Once the default mapping generator has written it into a file, it has already made up its mind about whether duplicates are preserved or not. We can just look at the R2RML views in the mapping and tell whether they discard duplicates or not. A conforming R2RML processor then obviously has to behave consistently with that written-down mapping across all operations.

This could perhaps be clarified a bit in the proposed text by stressing that cardinality preservation is a property of the generated *default mapping document*, and not necessarily of the default mapping generator implementation. So, instead of this:

[[
Implementations that provide default mappings based on the Direct Graph MUST document whether they preserve duplicate rows or not.
]]

we could say:

[[
Implementations that provide default mappings based on the Direct Graph MUST document whether the generated default mapping document preserves duplicate rows or not.
]]

Best,
Richard





> 
> Any idea how to say that?
> 
>>> (This is a forward ref to output dataset, ugh.)
>> 
>> It wouldn't be the first one in the R2RML spec :-( One could make this Section 12 instead of 4.4 to avoid the forward ref, but I'm not sure that's better in the end.
>> 
>> Best,
>> Richard
>> 
> 
> -- 
> -ericP
>
Received on Friday, 18 May 2012 10:44:57 UTC