Re: definition of natural mapping

* Richard Cyganiak <richard@cyganiak.de> [2012-01-12 07:55+0000]
> 
> On 11 Jan 2012, at 21:47, Eric Prud'hommeaux wrote:
> 
> > * Richard Cyganiak <richard@cyganiak.de> [2012-01-11 17:43+0000]
> >> There is already a Note that explicitly encourages appropriate mapping of vendor types.
> > 
> > Agreed, but I don't know from that if it overrides or supplements the normative text which says that all non-SQL types are expressed as plain literals.
> 
> Well, at least for R2RML this should be clear. The Conformance section of the R2RML spec states:
> 
> [[
> This specification defines R2RML for databases that conform to Core SQL 2008, as defined in ISO/IEC 9075-1:2008 [SQL1] and ISO/IEC 9075-2:2008 [SQL2]. Processors and mappings may have to deviate from the R2RML specification in order to support databases that do not conform to this version of SQL.
> ]]
> 
> So, a database that contains columns of, say, an XML datatype, does not conform to Core SQL 2008.

Right, so these are exactly in the purview of §10.2 •4 and the note below it.


> The approach taken in the R2RML spec is to define tightly what a processor is supposed to do with standard SQL 2008, and to occasionally do some useful hand-waving to guide implementers when it comes to real-world, non-standard SQL.
> 
> > If I havea database:
> > 
> >  Products
> >  │ ID │ NAME │ HTMLDESC                                    │
> >  │  8 │ toy1 │ <html xmlns="http://www.w3.org/1999/xhtml"> │
> >  │    │      │   <head>…<title>T</title></head>            │
> >  │    │      │   <body>                                    │
> >  │    │      │     <p>blah blah blah</b>                   │
> >  │    │      │   …                                         │
> >  │    │      │   </body>                                   │
> >  │    │      │ </html>                                     │
> 
> I assume HTMLDESC is a (non-Core-2008) XML datatype?

Yep.


> > , MAY/MUST I produce:
> > 
> >  <Products/ID.8> <Products/NAME> "toy1" ;
> >                  <Products/HTMLDESC> "<html>…</html>"^^rdf:XMLLiteral .
> > 
> > , or:
> > 
> >  <Products/ID.8> <Products/NAME> "toy1" ;
> >                  <Products/HTMLDESC> "<html>…</html>" .
> > 
> > , or:
> > 
> >  <Products/ID.8> <Products/NAME> "toy1" ;
> >                  <Products/HTMLDESC> "<html>…</html>"^^rdf:XMLLiteral ;
> >                  <Products/HTMLDESC> "<html>…</html>"
> > 
> > ?
> 
> The R2RML specification does not normatively answer that question because it only handles Core SQL 2008.
> 
> “Processors and mappings may have to deviate from the R2RML specification in order to support databases that do not conform to this version of SQL.”
> 
> However, if you implement the R2RML specification to the letter, and then ignore the fact that the implementation is only supposed to be used with Core SQL 2008 conforming databases, and use the implementation with a non-conforming database anyways, then the implementation will still do the Right Thing and produce the second output. #2 is a better behaviour than various other possible behaviours (e.g., silently produce no triple, or die horribly) that implementers might choose if we didn't say anything.
> 
> The spec also has an informative note that explicitly encourages going beyond this normatively defined behaviour by producing the first variant instead. That would be an R2RML extension. I think all bases are covered…

I can't confidently read that as "instead". Plus, it's informative text which is supposed to override the normative point 4 of the definition of a natural RDF literal.

I propose that the following achieves the desired effect:
  move point 4 into the "vendor-specific types" note just below (and remove the R2RML-specific clause):
  [[
  *Note* The natural rdf literal is defined for SQL 2008 datatypes. The natural rdf literal may-or-should be extended to map vendor-specific datatypes to RDF by behaving as if the table above contained additional rows that associate the SQL datatypes with appropriate RDF-compatible datatypes (e.g., the XML Schema built-in types [XMLSCHEMA2]), and appropriate lexical transformations where required. If there is no appropriate datatype, the value may be <a href="#dfn-cast-to-string">cast to string</a> and expressed as an <a href="http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/#dfn-plain-literal">RDF plain literal</a>. Future versions of this specification may define mappings for vendor-specific datatypes or datatypes added to the SQL specification.
  ]]
  modify the forward reference in §10.1 ¶2 to say
  [[
  INTERVAL, Vendor-specific types and types added to future SQL specifications may-or-should be mapped to RDF with extensions to the mapping to natural rdf literal.
  ]].

I expect there to be a discussion about whether may-or-should should be may or should, but let's see if we can first agree on the structure. Since this is not normative, the may or must should not be written as RFC2119 keywords.

I've mocked this up with "should" in <http://www.w3.org/2012/01/§10>.


> >> However, where implementations have no knowledge what an “appropriate mapping” might be for a given type, they should map it to a plain literal – that's the right approach both in DM and R2RML, and it is important that this is explicitly stated.
> > 
> > I support the intent, so long as we don't have to support a legacy which prevents us from standardizing these types in the future. 
> 
> Well, at that point you'll have to support the legacy anyways. Implementers will implement *some* behaviour for these types, no matter if we leave the behaviour undefined in the spec now or define a fallback behaviour.

There's a substantial difference between a legacy which was in a spec and a legacy which derives from people extending the spec. The SPARQL charter mandated backward compatibility with SPARQL 1.0. The current SPARQL LC is incompatible with vendor-specific extensions to 1.0 around e.g. aggregates, comma-separated SELECTs, BINDs, LETs etc. Being explicit about what's standard and what's a good idea is good for future extensibility and good for consumers who need to know what they can count on and what may change.


> > Otherwise, standardizing beyond our use cases will be ultimately harmful.
> 
> Handling vendor-specific datatypes (both known ones, and ones that the implementer has never heard about) is one of my use cases. I feel that the R2RML design addresses that well.
> 
> Best,
> Richard
> 
> 
> 
> 
> > 
> > 
> >> The current design has been approved by a WG resolution.
> >> 
> >> I see no reason to take further action here.
> >> 
> >> Best,
> >> Richard
> >> 
> >> 
> >> On 11 Jan 2012, at 16:59, Eric Prud'hommeaux wrote:
> >> 
> >>> * Richard Cyganiak <richard@cyganiak.de> [2012-01-11 16:07+0000]
> >>>> On 11 Jan 2012, at 15:38, Eric Prud'hommeaux wrote:
> >>>>> My proposal is editorial.
> >>>> 
> >>>> I prefer the current R2RML wording.
> >>>> 
> >>>>> Digging into the SQL, I think we can learn that bullet 2, "character string type", covers the six types with the word "CHARACTER" in them. My concearn is for the reader who doesn't dig deeply and wonders if CHARACTER(1) is a "string type".
> >>>> 
> >>>> If they consider it as a non-string type, the result is the same, because vendor extension types are handled in the same way as strings.
> >>>> 
> >>>>> Looking at the SQL 2006 in front of me, I think that bullet 3 covers the rest of the SQL data types. What then does bullet 4 apply to?
> >>>> 
> >>>> Vendor extensions.
> >>> 
> >>> Ahh, but I recall that the verbal agreement was that the vendor extensions specifically not have a normative expression precisely because we wanted to later be able to define lexical forms and data types for e.g. XML and lat/long datatypes. The XML type is extremely popular and it is inappropriate that the DM expose it as anything other than XMLLiteral. In R2RML, this is perhaps less pernicious because the user is able to map the literal "<root>stuff</root>" to an XMLLiteral without exposing the former.
> >>> 
> >>> 
> >>>> Best,
> >>>> Richard
> >>>> 
> >>>> 
> >>>> 
> >>>>> 
> >>>>> 
> >>>>>>> (E.g. many DBs have native support for XML but we probably don't want to imply that XML is serialized as a (canonicalized?) plain literal.
> >>>>>> 
> >>>>>> The spec explicitly states:
> >>>>>> 
> >>>>>> [[
> >>>>>> Note: R2RML processor implementations that handle vendor-specific types or user-defined types beyond the standard SQL 2008 datatypes are expected to do so by behaving as if the table above contained additional rows that associate the SQL datatypes with appropriate RDF-compatible datatypes (e.g., the XML Schema built-in types [XMLSCHEMA2]), and appropriate lexical transformations where required.
> >>>>>> ]]
> >>>>>> 
> >>>>>>> I proposed to strike the line.
> >>>>>> 
> >>>>>> The line is backed by a formal WG resolution.
> >>>>>> 
> >>>>>>> An alternative would be to refine http://www.w3.org/2001/sw/rdb2rdf/directMapping/LC/#defn-literal_map to say [[
> >>>>>>> Definition literal map: a mapping from an SQL value with a datatype to:
> >>>>>>> 
> >>>>>>> • For row nodes, a canonical RDF literal representation of the column value as defined in points 1-3 in R2RML section 10.2 Natural Mapping of SQL Values.
> >>>>>>> • For other R2RML natural RDF literal representation of the column value as defined in points 1-3 in R2RML section 10.2 Natural Mapping of SQL Values.
> >>>>>>> ]] (note the added "points 1-3 in" text).
> >>>>>> 
> >>>>>> See above.
> >>>>>> 
> >>>>>> Many commenters stated that DM and R2RML should use the *same* mapping and should *not* have random exceptions in weird corner cases. I made an LC comment to that effect. We have a WG resolution that supports this. The WG has spent a *lot* of time crafting a compromise design that makes sense both for R2RML and for the DM and that was – at least at that time – acceptable to everyone. I'd rather not re-open this can of worms.
> >>>>>> 
> >>>>>> So can we please stick to the original plan for the DM?
> >>>>>> 
> >>>>>> [[
> >>>>>> 1. when creating literals, use the “natural RDF literal” corresponding to the SQL data value
> >>>>>> http://www.w3.org/2001/sw/rdb2rdf/r2rml/#dfn-natural-rdf-literal
> >>>>>> 2. when creating IRIs, use the “canonical RDF lexical form” corresponding to the SQL data value
> >>>>>> http://www.w3.org/2001/sw/rdb2rdf/r2rml/#dfn-canonical-rdf-lexical-form
> >>>>>> 3. perhaps add an informative link to this section:
> >>>>>> http://www.w3.org/2001/sw/rdb2rdf/r2rml/#xsd-summary
> >>>>>> 4. suggest any editorial changes to section 10 that make integration with the DM smoother or otherwise improve the section
> >>>>> 
> >>>>> Yep, I'm making editoral proposals.
> >>>>> 
> >>>>> 
> >>>>>> ]]
> >>>>>> http://www.w3.org/2001/sw/rdb2rdf/track/actions/174
> >>>>>> 
> >>>>>> Thanks,
> >>>>>> Richard
> >>>>>> 
> >>>>>> 
> >>>>>>> 
> >>>>>>> -- 
> >>>>>>> -ericP
> >>>>>>> 
> >>>>>> 
> >>>>> 
> >>>>> -- 
> >>>>> -ericP
> >>>> 
> >>> 
> >>> -- 
> >>> -ericP
> >> 
> > 
> > -- 
> > -ericP
> > 
> 

-- 
-ericP

Received on Friday, 13 January 2012 01:03:39 UTC