Re: Proposal for the Direct Mapping

* Juan Sequeda <juanfederico@gmail.com> [2011-08-02 08:05-0500]
> Eric, all
> 
> This is my proposal. Just a few changes, and added subsections.
> 
>    PROPOSAL: that the English definition of the direct mapping be defined
> as:
> 
> [[
> 
> Section 3: The Direct Mapping
> 
> 
> The Direct Mapping is a formula for creating an RDF graph from the rows of
> all the tables in a database.
> 
> 
> A base IRI defines a web space for the labels in this graph; all labels are
> generated by appending to the base.
> 
> 
> The functions scalar and reference extract the scalar and reference
> attributes (those participating in a foreign key) respectively:
> 
> 
> dfn scalars: the attributes in a table which are NOT in any foreign key.
> 
> 
> dfn references: the attributes in a table's foreign keys.
> 
> 
> dfn non-unary references: the references for which the table's foreign key
> is NOT composed of a single attribute.
> 
> 
> Section 3.1: Generating Row Identifiers
> 
> 
> SQL table and attribute identifiers compose RDF IRIs in the direct graph.
> These identifiers are separated by the punctuation characters '#', ',', '/'
> and '='. All SQL identifiers are escaped following URL-encoding
> 
> <
> http://www.w3.org/TR/html5/association-of-controls-and-forms.html#url-encoded-form-data
> >
> 
> except that only the above punctuation and the characters not permitted in
> RDF IRIs are escaped.
> 
> 
> In the direct graph, there is an identifier for each row in a database
> table. If the row is in a table with a primary key, this is formed from the
> table name and the attribute names and values of each attribute in the
> primary key. If there is no primary key for the table, the row identifier is
> a fresh blank node:
> 
> 
> dfn row identifier:
> 
> 
>   if the table has a primary key with attributes, the relative IRI for
> 
>   the row identifier is the concatenation of the table name, '/', and
> 
>   a ','-separated concatenation of each attribute name, '=', and the
> 
>   attribute value.
> 
> 
>   if the table has no primary key, the row identifier is a fresh blank
> 
>   node.
> 
> 
> A (potentially unary) list of attribute names in a table form a
> 
> property IRI:
> 
> 
> dfn property IRI: the concationation of the table name, '/', and a
> 
>   ','-separated concatonation of each attribute name, and a '#' at
> 
>   the end of the property IRI.
> 
> 
> Section 3.2: Mapping database values to RDF Literals
> 
> 
> The values in a row are mapped to RDF literals:
> 
> 
> dfn literal map: a mapping from a SQL value with a datatype to an RDF
> 
>   literal with and XML Schema datatype where the RDF literal has a
> 
>   lexical value equivalent to the SQL lexical value and the datatype
> 
>   mapping is found in this table:
> 
> 
> SQL     XSD datatype
> 
> ___     ____________
> 
> INT     http://www.w3.org/TR/xmlschema-2/#integer
> 
> FLOAT   http://www.w3.org/TR/xmlschema-2/#float
> 
> DATE    http://www.w3.org/TR/xmlschema-2/#date
> 
> TIME    http://www.w3.org/TR/xmlschema-2/#time
> 
> TIMESTAMP       http://www.w3.org/TR/xmlschema-2/#dateTime
> 
> CHAR    plain literal
> 
> VARCHAR plain literal
> 
> STRING  plain literal
> 
> 
> Section 3.3: Generating RDF Triples
> 
> 
> The Direct Mapping is defined by a set of mapping functions from table
> 
> rows to RDF triples:
> 
> 
> dfn direct mapping: the set of RDF triples produced by invoking the <table
> mapping> on each table in a database.
> 
> 
> dfn table mapping: the set of RDF triples created by invoking the <row
> mapping> on each row in a table.
> 
> 
> dfn row mapping: using a Row Identifier S for each row,
> 
>  the type triple:
> 
>    (S, rdf:type, <table type>)
> 
>  plus the scalar triples:
> 
>    for each attribute in the list of <scalars> where the attribute value is
> non-NULL:
> 
>      (S, the <property IRI> for the attribute, the <literal map> for the
> attribute value).
> 
>  plus the reference triples:
> 
>    for each list of attributes in the <non-unary references> where none of
> the attribute values are NULL:
> 
>      (S, the <property IRI> for the attributes, the <row identifier> for the
> referenced triple)
> 
> ]]

Thank you for the careful review and for correcting typos.
Ingoring whitespace, I see:

  • added numbered section headings:
    I propose that we first agree on the definition and do markup
    separately.

  • my precious typos were corrected.
    I can live without them.

  • re-ordered dfn references: and dfn scalars:
    sure.

  • DM is for "all the tables in a database"
    I debated this; I didn't want to be alarm folks who would think
    they'd have to expose everything if they didn't want to. The
    alternative is to parametrize; neither is terribly attractive. I
    guess "all tables" is fine.

  • s/an SQL/a SQL/
    This depends on whether you call it "S Q L" or "sequal". The SQL
    spec uses "an", e.g. "Effects of SQL-statements in an SQL-transaction".

  • row mapping defined to be over each row.
    The calling function <table mapping> already "invokes the <row
    mapping> on each row in a table" so the row mapping should just be
    for a single row.

  • capitalize "Row Identifier" in dfn row mapping.
    I suspect this wasn't an intended change proposal.

below are the diffs and an incorporated proposal

== white-space-normalized diffs ==
@@ -1,18 +1,23 @@
+Section 3: The Direct Mapping
+
 The Direct Mapping is a formula for creating an RDF graph from the
-rows in a table. A base IRI defines a web space for the labels in
+rows of all the tables in a database. A base IRI defines a web space for the labels in
 this graph; all labels are generated by appending to the base.
 
 The functions scalar and reference extract the scalar and reference
 attributes (those participating in a foreign key) respectively:
 
-dfn references: the attributes in a table's foreign keys.
-
 dfn scalars: the attributes in a table which are NOT in any foreign
    key.
 
+dfn references: the attributes in a table's foreign keys.
+
 dfn non-unary references: the references for which the table's
    foreign key is NOT composed of a single attribute.
 
+
+Section 3.1: Generating Row Identifiers
+
 SQL table and attribute identifiers compose RDF IRIs in the direct
 graph. These identifiers are separated by the punctuation characters
 '#', ',', '/' and '='. All SQL identifiers are escaped following URL-
@@ -30,8 +35,8 @@
 dfn row identifier:
 
    if the table has a primary key with attributes, the relative IRI for
-   the row identifier is the concationation of the table name, '/', and
-   a ','-separated concatonation of each attribute name, '=', and the
+   the row identifier is the concatenation of the table name, '/', and
+   a ','-separated concatenation of each attribute name, '=', and the
    attribute value.
 
    if the table has no primary key, the row identifier is a fresh blank
@@ -44,9 +49,11 @@
    ','-separated concatonation of each attribute name, and a '#' at
    the end of the property IRI.
 
+Section 3.2: Mapping database values to RDF Literals
+
 The values in a row are mapped to RDF literals:
 
-dfn litaral map: a mapping from an SQL value with a datatype to an RDF
+dfn literal map: a mapping from a SQL value with a datatype to an RDF
    literal with and XML Schema datatype where the RDF literal has a
    lexical value equivalent to the SQL lexical value and the datatype
    mapping is found in this table:
@@ -62,16 +69,18 @@
 VARCHAR plain literal
 STRING  plain literal
 
-The Direct Maping is defined by a set of mapping functions from table
+Section 3.3: Generating RDF Triples
+
+The Direct Mapping is defined by a set of mapping functions from table
 rows to RDF triples:
 
-dfn direct mapping: the set of triples produced by invoking the
+dfn direct mapping: the set of RDF triples produced by invoking the
    <table mapping> on each table in a database.
 
 dfn table mapping: the set of RDF triples created by invoking the
    <row mapping> on each row in a table.
 
-dfn row mapping: using a row identifier S for the row,
+dfn row mapping: using a Row Identifier S for each row,
   the type triple:
     (S, rdf:type, <table type>)
   plus the scalar triples:


== incorporated proposal ==
PROPOSAL: that the English definition of the direct mapping be defined as:

[[
The Direct Mapping is a formula for creating an RDF graph from the
rows of each table in a database. A base IRI defines a web space for
the labels in this graph; all labels are generated by appending to the
base.

The functions scalar and reference extract the scalar and reference
attributes (those participating in a foreign key) respectively:

dfn scalars: the attributes in a table which are NOT in any foreign
   key.

dfn references: the attributes in a table's foreign keys.

dfn non-unary references: the references for which the table's
   foreign key is NOT composed of a single attribute.

SQL table and attribute identifiers compose RDF IRIs in the direct
graph. These identifiers are separated by the punctuation characters
'#', ',', '/' and '='. All SQL identifiers are escaped following URL-
encoding
<http://www.w3.org/TR/html5/association-of-controls-and-forms.html#url-encoded-form-data>
except that only the above punctuation and the characters not
permitted in RDF IRIs are escaped.

In the direct graph, there is an identifier for each row in a database
table. If the row is in a table with a primary key, this is formed
from the table name and the attribute names and values of each attribute
in the primary key. If there is no primary key for the table, the row
identifier is a fresh blank node:

dfn row identifier:

   if the table has a primary key with attributes, the relative IRI for
   the row identifier is the concatenation of the table name, '/', and
   a ','-separated concatenation of each attribute name, '=', and the
   attribute value.

   if the table has no primary key, the row identifier is a fresh blank
   node.

A (potentially unary) list of attribute names in a table form a
property IRI:

dfn property IRI: the concationation of the table name, '/', and a
   ','-separated concatonation of each attribute name, and a '#' at
   the end of the property IRI.

The values in a row are mapped to RDF literals:

dfn literal map: a mapping from an SQL value with a datatype to an RDF
   literal with and XML Schema datatype where the RDF literal has a
   lexical value equivalent to the SQL lexical value and the datatype
   mapping is found in this table:

SQL   XSD datatype
___     ____________
INT  http://www.w3.org/TR/xmlschema-2/#integer
FLOAT  http://www.w3.org/TR/xmlschema-2/#float
DATE  http://www.w3.org/TR/xmlschema-2/#date
TIME  http://www.w3.org/TR/xmlschema-2/#time
TIMESTAMP  http://www.w3.org/TR/xmlschema-2/#dateTime
CHAR  plain literal
VARCHAR plain literal
STRING  plain literal

The Direct Mapping is defined by a set of mapping functions from table
rows to RDF triples:

dfn direct mapping: the set of RDF triples produced by invoking the
   <table mapping> on each table in a database.

dfn table mapping: the set of RDF triples created by invoking the
   <row mapping> on each row in a table.

dfn row mapping: using a row identifier S for the row,
  the type triple:
    (S, rdf:type, <table type>)
  plus the scalar triples:
    for each attribute in the list of <scalars> where the attribute
      value is non-NULL:
      (S,
       the <property IRI> for the attribute,
       the <literal map> for the attribute value).
  plus the reference triples:
    for each list of attributes in the <non-unary references> where none
      of the attribute values are NULL:
      (S,
       the <property IRI> for the attributes,
       the <row identifier> for the referenced triple)
]]


> Juan Sequeda
> +1-575-SEQ-UEDA
> www.juansequeda.com
> 
> 
> On Tue, Aug 2, 2011 at 6:44 AM, Juan Sequeda <juanfederico@gmail.com> wrote:
> 
> > Eric,
> >
> > This is great. I was planning to write up a proposal myself, but you saved
> > my time. I do have some comments and suggestions. I'm writing up a new
> > proposal based on what you have. I should have it done before the meeting
> >
> > Juan Sequeda
> > www.juansequeda.com
> >
> > On Aug 2, 2011, at 2:01 AM, Michael Hausenblas <
> > michael.hausenblas@deri.org> wrote:
> >
> > >
> > > Eric,
> > >
> > >> PROPOSAL: that the English definition of the direct mapping be defined
> > as:
> > >> [[
> > >> The Direct Mapping is a formula for creating an RDF graph from the
> > >> rows in a table. A base IRI defines a web space for the labels in
> > >
> > > ...
> > >
> > > Thanks a lot for this proposal, Eric! I'm wondering if we're ready to
> > resolve this today or if the WG feels that we need to discuss a bit more. In
> > any case I'm flexible to change today's agenda [1] if the WG thinks it makes
> > sense ...
> > >
> > > Cheers,
> > >    Michael
> > >
> > > [1]
> > http://lists.w3.org/Archives/Public/public-rdb2rdf-wg/2011Jul/0183.html
> > > --
> > > Dr. Michael Hausenblas, Research Fellow
> > > LiDRC - Linked Data Research Centre
> > > DERI - Digital Enterprise Research Institute
> > > NUIG - National University of Ireland, Galway
> > > Ireland, Europe
> > > Tel. +353 91 495730
> > > http://linkeddata.deri.ie/
> > > http://sw-app.org/about.html
> > >
> > > On 2 Aug 2011, at 00:22, Eric Prud'hommeaux wrote:
> > >
> > >> * Richard Cyganiak <richard@cyganiak.de> [2011-07-26 19:41+0100]
> > >>> Hi all,
> > >>>
> > >>> The Direct Mapping document is stuck because we have a stalemate
> > between the editors. With Last Call approaching, we need *some* way of
> > breaking the stalemate. So here's a proposal. This is a possible new outline
> > for the document, along with assignments of separate sections to separate
> > editors.
> > >>>
> > >>>
> > >>>   1. Introduction
> > >>>      - What is this?
> > >>>      - How does it relate to R2RML
> > >>>      - Target audience, assumed level of knowledge
> > >>>      - RDF terms and SQL/relational terms are used as defined in
> > >>>        documents XXX and YYY
> > >>>
> > >>>   2. Example (Informative)
> > >>>      - A simple two-table example
> > >>>      - Quick explanation of foreign key handling
> > >>>      - Quick explanation of tables w/o PKs
> > >>>
> > >>>   3. The Direct Mapping [in Plain English]
> > >>>      - “The Direct Graph of a database is the union of the Table Graphs
> > >>>         of all tables in the database.”
> > >>>      - “The Table Graph of a table is the union of the Row Graphs...”
> > >>>      - “The Row Graph of a row is ...”
> > >>>      - ...
> > >>
> > >> PROPOSAL: that the English definition of the direct mapping be defined
> > as:
> > >> [[
> > >> The Direct Mapping is a formula for creating an RDF graph from the
> > >> rows in a table. A base IRI defines a web space for the labels in
> > >> this graph; all labels are generated by appending to the base.
> > >>
> > >> The functions scalar and reference extract the scalar and reference
> > >> attributes (those participating in a foreign key) respectively:
> > >>
> > >> dfn references: the attributes in a table's foreign keys.
> > >>
> > >> dfn scalars: the attributes in a table which are NOT in any foreign
> > >>  key.
> > >>
> > >> dfn: non-unary references: the references for which the table's
> > >>  foreign key is NOT composed of a single attribute.
> > >>
> > >> SQL table and attribute identifiers compose RDF IRIs in the direct
> > >> graph. These identifiers are separated by the punctuation characters
> > >> '#', ',', '/' and '='. All SQL identifiers are escaped following URL-
> > >> encoding
> > >> <
> > http://www.w3.org/TR/html5/association-of-controls-and-forms.html#url-encoded-form-data
> > >
> > >> except that only the above punctuation and the characters not
> > >> permitted in RDF IRIs are escaped.
> > >>
> > >> In the direct graph, there is an identifier for each row in a database
> > >> table. If the row is in a table with a primary key, this is formed
> > >> from the table name and the attribute names and values of each attribute
> > >> in the primary key. If there is no primary key for the table, the row
> > >> identifier is a fresh blank node:
> > >>
> > >> dfn row identifier:
> > >>
> > >>  if the table has a primary key with attributes, the relative IRI for
> > >>  the row identifier is the concationation of the table name, '/', and
> > >>  a ','-separated concatonation of each attribute name, '=', and the
> > >>  attribute value.
> > >>
> > >>  if the table has no primary key, the row identifier is a fresh blank
> > >>  node.
> > >>
> > >> A (potentially unary) list of attribute names in a table form a
> > >> property IRI:
> > >>
> > >> dfn property IRI: the concationation of the table name, '/', and a
> > >>  ','-separated concatonation of each attribute name, and a '#' at
> > >>  the end of the property IRI.
> > >>
> > >> The values in a row are mapped to RDF literals:
> > >>
> > >> dfn litaral map: a mapping from an SQL value with a datatype to an RDF
> > >>  literal with and XML Schema datatype where the RDF literal has a
> > >>  lexical value equivalent to the SQL lexical value and the datatype
> > >>  mapping is found in this table:
> > >>
> > >> SQL      XSD datatype
> > >> ___     ____________
> > >> INT    http://www.w3.org/TR/xmlschema-2/#integer
> > >> FLOAT    http://www.w3.org/TR/xmlschema-2/#float
> > >> DATE    http://www.w3.org/TR/xmlschema-2/#date
> > >> TIME    http://www.w3.org/TR/xmlschema-2/#time
> > >> TIMESTAMP    http://www.w3.org/TR/xmlschema-2/#dateTime
> > >> CHAR    plain literal
> > >> VARCHAR plain literal
> > >> STRING    plain literal
> > >>
> > >> The Direct Maping is defined by a set of mapping functions from table
> > >> rows to RDF triples:
> > >>
> > >> dfn direct mapping: the set of triples produced by invoking the
> > >>  <table mapping> on each table in a database.
> > >>
> > >> dfn table mapping: the set of RDF triples created by invoking the
> > >>  <row mapping> on each row in a table.
> > >>
> > >> dfn row mapping: using a row identifier S for the row,
> > >> the type triple:
> > >>   (S, rdf:type, <table type>)
> > >> plus the scalar triples:
> > >>   for each attribute in the list of <scalars> where the attribute
> > >>     value is non-NULL:
> > >>     (S,
> > >>      the <property IRI> for the attribute,
> > >>      the <literal map> for the attribute value).
> > >> plus the reference triples:
> > >>   for each list of attributes in the <non-unary references> where none
> > >>     of the attribute values are NULL:
> > >>     (S,
> > >>      the <property IRI> for the attributes,
> > >>      the <row identifier> for the referenced triple)
> > >> ]]
> > >>
> > >>>   A. Appendix: Formalisms (Informative)
> > >>>      - should be crisp, short, precise, with only minimum explanation
> > >>>        and examples
> > >>>      A.1 Datalog Rules
> > >>>      A.2 Denotational Semantics
> > >>>      A.3 Set-Style Direct Mapping
> > >>>
> > >>>   B. Acknowledgements (Informative)
> > >>>
> > >>>   C. References
> > >>>
> > >>>
> > >>> I see Juan and Marcelo editing A.1.
> > >>>
> > >>> I see Alexandre editing A.2.
> > >>>
> > >>> I see Eric editing 2 (which he already wrote), 3 (which *mostly*
> > exists), and A.3.
> > >>>
> > >>> I don't know about 1, B, and C.
> > >>>
> > >>> My reasoning is that there is no objective way of picking any of the
> > formalisms over another formalism, so the normative expression should be the
> > lowest common denominator: plain English. By making the formalisms all
> > informative, we free them from the burden of having to explain the direct
> > mapping itself in a generally accessible way. The focus can be totally on
> > presenting the formalisms in all their terseness to an audience that is
> > familiar with datalog/denotational semantics/whatever.
> > >>>
> > >>> I hope this proposal aids discussion.
> > >>>
> > >>> Best,
> > >>> Richard
> > >>
> > >> --
> > >> -ericP
> > >>
> > >
> > >
> >

-- 
-ericP

Received on Tuesday, 2 August 2011 14:19:53 UTC