Re: comparison of no-functional-change proposes for no-primary-key issue

http://www.w3.org/mid/20120516202128.GI3282@w3.org
* Eric Prud'hommeaux <eric@w3.org> [2012-05-16 16:21-0400]
> * ashok malhotra <ashok.malhotra@oracle.com> [2012-05-16 05:59-0700]
> > Eric:
> > The statement that the DM provides default behavior appears in the DM spec, so it needs to
> > be addressed there.
> 
> Apologies, quite right.
> 
> > I do not think there is disagreement with your points 1 to 3 but we need a succinct statement
> > that captures the situation.  I have no real quarrel with the words Richard suggested except that
> > I want to say the same thing in fewer words.
> 
> I understand your intent, but think that some extra words are useful. Below are the three proposals that I see on the table. I've laid them out side-by-side (just below) and sequentially with long lines (further below). Changes are embedded in >><<s. The first included paragraph of the R2RML Intro is the same for all proposals; Ivan's proposal appends some text to it, and I've included it for context.

I think I favor the explicitness of Richard's with a couple textual proposals below:


> ---- Ivan ----                      ---- Richard ----                  ---- Ashok ----                         
> =DM Intro=                        =DM Intro=                         =DM Intro=                        
> The Direct Mapping is intended    >>This specification has a         The Direct Mapping is intended    
> to provide a default behavior     companion, the R2RML mapping       to provide a default behavior             
> for R2RML: RDB to RDF Mapping     language [R2RML], that allows      for R2RML: RDB to RDF Mapping             
> Language [R2RML] >>for tables     the creation of customized         Language [R2RML]>>₁<<. It can     
> which have at least one unique    mapping from relational data       also be used to materialize               
> key<<. It can also be used to     to RDF. R2RML defines a            RDF graphs or define virtual              
> materialize RDF graphs or         relaxed variant of the Direct      graphs, which can be queried              
> define virtual graphs, which      Mapping intended as a default      by SPARQL or traversed by an              
> can be queried by SPARQL or       mapping for further                RDF graph API.            
> traversed by an RDF graph         customization.<< It can also                                         
> API.                              be used to materialize RDF         >>₁ Except in the case of         
>                                   graphs or define virtual           tables or views without a         
>                                   graphs, which can be queried       primary key.  In this case,               
>                                   by SPARQL or traversed by an       identical rows may be kept                
>                                   RDF graph API.                     distinct by the DM and                    
>                                                                      collapsed into a single row               
>                                                                      by R2RML<<                                

Like Ashok, I was tempted to be explicit about what a "relaxed variant" is. As it turns out, it's identical to the DM over the unique rows.
I think it might be a bit awkward so I'm tempted to use Ricarhd's wording directly, but if folks think it's worth the extra noise, here's what I wrote:
[[
s/R2RML defines a relaxed variant of the Direct Mapping intended as a default mapping for further customization.
 /R2RML uses the Direct Mapping as a default mapping for further customization. For tables with no unique keys, R2RML implementations may use the Direct Mapping over only the unique rows in tables with no unique key.
 /

The other minor mod is s/It can also/The Direct Mapping can also/ 'cause the antecedent has gotten stale by the time you get there.


> =R2RML Intro=                     =R2RML Intro=                      =R2RML Intro=                                     
> This specification has a          This specification has a           This specification has a          
> companion that defines a          companion that defines a           companion that defines a          
> direct mapping from relational    direct mapping from relational     direct mapping from relational                                                
> databases to RDF [DM]. In the     databases to RDF [DM]. In the      databases to RDF [DM]. In the             
> direct mapping of a database,     direct mapping of a database,      direct mapping of a database,             
> the structure of the resulting    the structure of the resulting     the structure of the resulting            
> RDF graph directly reflects       RDF graph directly reflects        RDF graph directly reflects               
> the structure of the database,    the structure of the database,     the structure of the database,            
> the target RDF vocabulary         the target RDF vocabulary          the target RDF vocabulary                 
> directly reflects the names of    directly reflects the names of     directly reflects the names of            
> database schema elements, and     database schema elements, and      database schema elements, and             
> neither structure nor target      neither structure nor target       neither structure nor target              
> vocabulary can be                 vocabulary can be                  vocabulary can be                         
> changed. With R2RML on the        changed. With R2RML on the         changed. With R2RML on the                
> other hand, a mapping author      other hand, a mapping author       other hand, a mapping author              
> can define highly customized      can define highly customized       can define highly customized              
> views over the relational         views over the relational          views over the relational                 
> data.                             data.                              data.                                     
>                                   
> >>R2RML implementations are       >>==4.4 Default Mapping==     
> encouraged to provide a           An R2RML processor MAY include
> default mapping equivalent to     an *R2RML default mapping     
> the Direct Mapping for tables     generator*. This is a facility                                   
> which have at least one unique    that introspects the schema of   
> key. For tables with no unique    the input database and                                               
> key and which have multiple       generates a *default mapping                                 
> identical rows, the output        document* intended for further       
> dataset produced by the           customization by a mapping           
> default mapping will be           author. The R2RML mapping            
> equivalent to the Direct          expressed in the default             
> Mapping over the unique rows      mapping document SHOULD be           
> in that table.<<                  such that its output is the   
>                                   Direct Graph [DM]             
> =R2RML 6.1=                       corresponding to the input    
> >>Because rr:IRI and              database.                     
> rr:BlankNode subject labels                                     
> are generated from column         Duplicate row preservation:   
> values, R2RML mappings do not     For tables without a primary  
> preserve repeated rows in SQL     key, the Direct Graph requires
> databases.<<                      that a fresh blank node is    
>                                   created for each row. This    
>                                   ensures that duplicate rows in       
>                                   such tables are                      
>                                   preserved. This requirement is
>                                   relaxed for R2RML default     
>                                   mappings: They MAY re-use the        
>                                   same blank node for multiple         
>                                   duplicate rows. This behaviour       
>                                   does not preserve duplicate   
>                                   rows. Implementations that           
>                                   provide default mappings based       
>                                   on the Direct Graph MUST             
>                                   document whether they preserve
>                                   duplicate rows or not.<<      

In order to make users life easier, let's add that they must be consistent about using the DM or the uniquified variant:

s/Graph MUST document whether
 /Graph MUST be consistent about whether or not duplicate rows are exposed in the output dataset, and document whether
 /

(This is a forward ref to output dataset, ugh.)


> Again, in linear format with long lines:
> 
> ---- Ivan ----
> =DM Intro=
> The Direct Mapping is intended to provide a default behavior for R2RML: RDB to RDF Mapping Language [R2RML] >>for tables which have at least one unique key<<. It can also be used to materialize RDF graphs or define virtual graphs, which can be queried by SPARQL or traversed by an RDF graph API.]]
> 
> =R2RML Intro=
> This specification has a companion that defines a direct mapping from relational databases to RDF [DM]. In the direct mapping of a database, the structure of the resulting RDF graph directly reflects the structure of the database, the target RDF vocabulary directly reflects the names of database schema elements, and neither structure nor target vocabulary can be changed. With R2RML on the other hand, a mapping author can define highly customized views over the relational data.
> 
> >>R2RML implementations are encouraged to provide a default mapping equivalent to the Direct Mapping for tables which have at least one unique key. For tables with no unique key and which have multiple identical rows, the output dataset produced by the default mapping will be equivalent to the Direct Mapping over the unique rows in that table.<<
> 
> =R2RML 6.1=
> >>Because rr:IRI and rr:BlankNode subject labels are generated from column values, R2RML mappings do not preserve repeated rows in SQL databases.<<
> 
> 
> ---- Richard ----
> =DM Intro=
> >>This specification has a companion, the R2RML mapping language [R2RML], that allows the creation of customized mapping from relational data to RDF. R2RML defines a relaxed variant of the Direct Mapping intended as a default mapping for further customization.<< It can also be used to materialize RDF graphs or define virtual graphs, which can be queried by SPARQL or traversed by an RDF graph API.]]
> 
> =R2RML Intro=
> This specification has a companion that defines a direct mapping from relational databases to RDF [DM]. In the direct mapping of a database, the structure of the resulting RDF graph directly reflects the structure of the database, the target RDF vocabulary directly reflects the names of database schema elements, and neither structure nor target vocabulary can be changed. With R2RML on the other hand, a mapping author can define highly customized views over the relational data.
> 
> >>==4.4 Default Mapping==
> An R2RML processor MAY include an *R2RML default mapping generator*. This is a facility that introspects the schema of the input database and generates a *default mapping document* intended for further customization by a mapping author. The R2RML mapping expressed in the default mapping document SHOULD be such that its output is the Direct Graph [DM] corresponding to the input database.
> 
> Duplicate row preservation: For tables without a primary key, the Direct Graph requires that a fresh blank node is created for each row. This ensures that duplicate rows in such tables are preserved. This requirement is relaxed for R2RML default mappings: They MAY re-use the same blank node for multiple duplicate rows. This behaviour does not preserve duplicate rows. Implementations that provide default mappings based on the Direct Graph MUST document whether they preserve duplicate rows or not.<<
> 
> 
> ---- Ashok ----
> =DM Intro=
> The Direct Mapping is intended to provide a default behavior for R2RML: RDB to RDF Mapping Language [R2RML]>>₁<<. It can also be used to materialize RDF graphs or define virtual graphs, which can be queried by SPARQL or traversed by an RDF graph API.]]
> >>₁ Except in the case of tables or views without a primary key.  In this case, identical rows may be kept distinct by the DM and collapsed into a single row by R2RML<<
> 
> =R2RML Intro=
> This specification has a companion that defines a direct mapping from relational databases to RDF [DM]. In the direct mapping of a database, the structure of the resulting RDF graph directly reflects the structure of the database, the target RDF vocabulary directly reflects the names of database schema elements, and neither structure nor target vocabulary can be changed. With R2RML on the other hand, a mapping author can define highly customized views over the relational data.
> 
> 
> > Feel free to suggest text.
> > All the best, Ashok
> > 
> > On 5/15/2012 8:23 PM, Eric Prud'hommeaux wrote:
> > >* ashok malhotra<ashok.malhotra@oracle.com>  [2012-05-15 15:19-0700]
> > >>I think we just need to fix the DM.  If you disagree, please indicate what else needs to be said.
> > >But what exactly is broken in the DM?
> > >
> > >That's a somewhat glib question, but the point I made during the call today (which I thought actually caught some momentum) was this:
> > >   1 The DM is able to preserve cardinality over tables with potentially repeated rows.
> > >   2 R2RML is not able to preserve cardinality over tables with potentially repeated rows while staying within pure SQL (that is, you may be able to use e.g. rownums or assignable variables in different flavors of SQL, but in the SQL that we're targeting, the required behavior exceeds the expressivity of SQL).
> > >   3 For every situation where an R2RML processor would be unable to produce a DM as a default behavior (that is, those where the DM preserved cardinality and R2RML does not), the users need to be warned that, because they have potentially repeated rows in non-unique tables, the R2RML representation will lose some of the information in the database.
> > >   4 Future versions of R2RML will likely address this issue, making it enabling a generic R2RML processor to capture all of the information in repeated rows, and therefor able to use the DM for these cases.
> > >
> > >This points to following Ivan's proposal<http://www.w3.org/mid/FD9565BB-380D-474B-9453-60C7CAF6072E@w3.org>  (add caveat text about when the DM is not the default or non-configured R2RML behavior). Adding text to the R2RML text in Ivan's proposal would help users understand the issues and the outcome. The current text is point 2 in Ivan's mail:
> > >[[
> > >2. Add to the R2RML document (probably in the intro part): "R2RML implementations are encouraged to provide a default mapping equivalent to the Direct Mapping for tables which have at least one unique key"
> > >]]
> > >Adding this text would specify the behavior when there is no unique key:
> > >[[
> > >For tables with no unique key and which have multiple identical rows, the output dataset produced by the default mapping will be equivalent to the Direct Mapping over the unique rows in that table.
> > >]]
> > >
> > >It's possible that we'll want to s/mapping will be equivalent/mapping MAY be equivalent/ because the simple mapping for SPARQL queries analogous conventional SQL queries, e.g.
> > >   SELECT ?who ?owes { ?debt<IOUs#fname>  ?who ;<IOUs#amount>  ?owes }
> > >or
> > >   SELECT ?fname (SUM(?owes) AS ?payupnow) { ?debt<IOUs#fname>  ?fname ;<IOUs#amount>  ?owes } GROUP BY ?fname
> > >would preserve cardinality unless one specifically invoked a subselect which grouped by all of the unique columns. (This consistency problem will arise R2RML regardless of whether DM is relaxed to potentially lose cardinality.)
> > >
> > >
> > >>The DM spec says:
> > >>[[The Direct Mapping is intended to provide a default behavior for R2RML: RDB to RDF Mapping Language<http://www.w3.org/TR/2012/CR-r2rml-20120223/>  [R2RML]<http://www.w3.org/TR/rdb-direct-mapping/#R2RML>. It can also be used to materialize RDF graphs or define virtual graphs, which can be queried by SPARQL or traversed by an RDF graph API.]]
> > >>
> > >>Add an asterisk after the first sentence and a footnote.  The footnote says:
> > >>[[Except in the case of tables or views without a primary key.  In this case, identical rows may be kept distinct
> > >>by the DM and collapsed into a single row by R2RML]]
> > >>
> > >>R2RML says:
> > >>[[This specification has a companion that defines a direct mapping from relational databases to RDF<http://www.w3.org/TR/rdb-direct-mapping/>  [DM<http://www.w3.org/TR/r2rml/#DM>]. In the direct mapping of a database, the structure of the resulting RDF graph directly reflects the structure of the database, the target RDF vocabulary directly reflects the names of database schema elements, and neither structure nor target vocabulary can be changed. With R2RML on the other hand, a mapping author can define highly customized views over the relational data.]]
> > >>
> > >>No change needed.
> > >>-- 
> > >>All the best, Ashok
> 
> -- 
> -ericP

-- 
-ericP

Received on Thursday, 17 May 2012 03:54:17 UTC