Re: Proposal for the Direct Mapping

On Tue, Aug 2, 2011 at 9:19 AM, Eric Prud'hommeaux <eric@w3.org> wrote:

> * Juan Sequeda <juanfederico@gmail.com> [2011-08-02 08:05-0500]
> > Eric, all
> >
> > This is my proposal. Just a few changes, and added subsections.
> >
> >    PROPOSAL: that the English definition of the direct mapping be defined
> > as:
> >
> > [[
> >
> > Section 3: The Direct Mapping
> >
> >
> > The Direct Mapping is a formula for creating an RDF graph from the rows
> of
> > all the tables in a database.
> >
> >
> > A base IRI defines a web space for the labels in this graph; all labels
> are
> > generated by appending to the base.
> >
> >
> > The functions scalar and reference extract the scalar and reference
> > attributes (those participating in a foreign key) respectively:
> >
> >
> > dfn scalars: the attributes in a table which are NOT in any foreign key.
> >
> >
> > dfn references: the attributes in a table's foreign keys.
> >
> >
> > dfn non-unary references: the references for which the table's foreign
> key
> > is NOT composed of a single attribute.
> >
> >
> > Section 3.1: Generating Row Identifiers
> >
> >
> > SQL table and attribute identifiers compose RDF IRIs in the direct graph.
> > These identifiers are separated by the punctuation characters '#', ',',
> '/'
> > and '='. All SQL identifiers are escaped following URL-encoding
> >
> > <
> >
> http://www.w3.org/TR/html5/association-of-controls-and-forms.html#url-encoded-form-data
> > >
> >
> > except that only the above punctuation and the characters not permitted
> in
> > RDF IRIs are escaped.
> >
> >
> > In the direct graph, there is an identifier for each row in a database
> > table. If the row is in a table with a primary key, this is formed from
> the
> > table name and the attribute names and values of each attribute in the
> > primary key. If there is no primary key for the table, the row identifier
> is
> > a fresh blank node:
> >
> >
> > dfn row identifier:
> >
> >
> >   if the table has a primary key with attributes, the relative IRI for
> >
> >   the row identifier is the concatenation of the table name, '/', and
> >
> >   a ','-separated concatenation of each attribute name, '=', and the
> >
> >   attribute value.
> >
> >
> >   if the table has no primary key, the row identifier is a fresh blank
> >
> >   node.
> >
> >
> > A (potentially unary) list of attribute names in a table form a
> >
> > property IRI:
> >
> >
> > dfn property IRI: the concationation of the table name, '/', and a
> >
> >   ','-separated concatonation of each attribute name, and a '#' at
> >
> >   the end of the property IRI.
> >
> >
> > Section 3.2: Mapping database values to RDF Literals
> >
> >
> > The values in a row are mapped to RDF literals:
> >
> >
> > dfn literal map: a mapping from a SQL value with a datatype to an RDF
> >
> >   literal with and XML Schema datatype where the RDF literal has a
> >
> >   lexical value equivalent to the SQL lexical value and the datatype
> >
> >   mapping is found in this table:
> >
> >
> > SQL     XSD datatype
> >
> > ___     ____________
> >
> > INT     http://www.w3.org/TR/xmlschema-2/#integer
> >
> > FLOAT   http://www.w3.org/TR/xmlschema-2/#float
> >
> > DATE    http://www.w3.org/TR/xmlschema-2/#date
> >
> > TIME    http://www.w3.org/TR/xmlschema-2/#time
> >
> > TIMESTAMP       http://www.w3.org/TR/xmlschema-2/#dateTime
> >
> > CHAR    plain literal
> >
> > VARCHAR plain literal
> >
> > STRING  plain literal
> >
> >
> > Section 3.3: Generating RDF Triples
> >
> >
> > The Direct Mapping is defined by a set of mapping functions from table
> >
> > rows to RDF triples:
> >
> >
> > dfn direct mapping: the set of RDF triples produced by invoking the
> <table
> > mapping> on each table in a database.
> >
> >
> > dfn table mapping: the set of RDF triples created by invoking the <row
> > mapping> on each row in a table.
> >
> >
> > dfn row mapping: using a Row Identifier S for each row,
> >
> >  the type triple:
> >
> >    (S, rdf:type, <table type>)
> >
> >  plus the scalar triples:
> >
> >    for each attribute in the list of <scalars> where the attribute value
> is
> > non-NULL:
> >
> >      (S, the <property IRI> for the attribute, the <literal map> for the
> > attribute value).
> >
> >  plus the reference triples:
> >
> >    for each list of attributes in the <non-unary references> where none
> of
> > the attribute values are NULL:
> >
> >      (S, the <property IRI> for the attributes, the <row identifier> for
> the
> > referenced triple)
> >
> > ]]
>
> Thank you for the careful review and for correcting typos.
> Ingoring whitespace, I see:
>
>  • added numbered section headings:
>    I propose that we first agree on the definition and do markup
>    separately.
>

Ok. But I think that adding the subsections is crucial

>
>  • my precious typos were corrected.
>    I can live without them.
>
>  • re-ordered dfn references: and dfn scalars:
>    sure.
>
>  • DM is for "all the tables in a database"
>    I debated this; I didn't want to be alarm folks who would think
>    they'd have to expose everything if they didn't want to. The
>    alternative is to parametrize; neither is terribly attractive. I
>    guess "all tables" is fine.
>

I understand. I was a bit hesitant about this too, but just wrote it to see
if you would catch it :)

"all tables" is fine with me.


>
>  • s/an SQL/a SQL/
>    This depends on whether you call it "S Q L" or "sequal". The SQL
>    spec uses "an", e.g. "Effects of SQL-statements in an SQL-transaction".
>
>  • row mapping defined to be over each row.
>    The calling function <table mapping> already "invokes the <row
>    mapping> on each row in a table" so the row mapping should just be
>    for a single row.
>
>  • capitalize "Row Identifier" in dfn row mapping.
>    I suspect this wasn't an intended change proposal.
>
> below are the diffs and an incorporated proposal
>
> == white-space-normalized diffs ==
> @@ -1,18 +1,23 @@
> +Section 3: The Direct Mapping
> +
>  The Direct Mapping is a formula for creating an RDF graph from the
> -rows in a table. A base IRI defines a web space for the labels in
> +rows of all the tables in a database. A base IRI defines a web space for
> the labels in
>  this graph; all labels are generated by appending to the base.
>
>  The functions scalar and reference extract the scalar and reference
>  attributes (those participating in a foreign key) respectively:
>
> -dfn references: the attributes in a table's foreign keys.
> -
>  dfn scalars: the attributes in a table which are NOT in any foreign
>    key.
>
> +dfn references: the attributes in a table's foreign keys.
> +
>  dfn non-unary references: the references for which the table's
>    foreign key is NOT composed of a single attribute.
>
> +
> +Section 3.1: Generating Row Identifiers
> +
>  SQL table and attribute identifiers compose RDF IRIs in the direct
>  graph. These identifiers are separated by the punctuation characters
>  '#', ',', '/' and '='. All SQL identifiers are escaped following URL-
> @@ -30,8 +35,8 @@
>  dfn row identifier:
>
>    if the table has a primary key with attributes, the relative IRI for
> -   the row identifier is the concationation of the table name, '/', and
> -   a ','-separated concatonation of each attribute name, '=', and the
> +   the row identifier is the concatenation of the table name, '/', and
> +   a ','-separated concatenation of each attribute name, '=', and the
>    attribute value.
>
>    if the table has no primary key, the row identifier is a fresh blank
> @@ -44,9 +49,11 @@
>     ','-separated concatonation of each attribute name, and a '#' at
>    the end of the property IRI.
>
> +Section 3.2: Mapping database values to RDF Literals
> +
>  The values in a row are mapped to RDF literals:
>
> -dfn litaral map: a mapping from an SQL value with a datatype to an RDF
> +dfn literal map: a mapping from a SQL value with a datatype to an RDF
>     literal with and XML Schema datatype where the RDF literal has a
>    lexical value equivalent to the SQL lexical value and the datatype
>    mapping is found in this table:
> @@ -62,16 +69,18 @@
>  VARCHAR plain literal
>  STRING         plain literal
>
> -The Direct Maping is defined by a set of mapping functions from table
> +Section 3.3: Generating RDF Triples
> +
> +The Direct Mapping is defined by a set of mapping functions from table
>  rows to RDF triples:
>
> -dfn direct mapping: the set of triples produced by invoking the
> +dfn direct mapping: the set of RDF triples produced by invoking the
>     <table mapping> on each table in a database.
>
>  dfn table mapping: the set of RDF triples created by invoking the
>    <row mapping> on each row in a table.
>
> -dfn row mapping: using a row identifier S for the row,
> +dfn row mapping: using a Row Identifier S for each row,
>    the type triple:
>     (S, rdf:type, <table type>)
>   plus the scalar triples:
>
>
> == incorporated proposal ==
> PROPOSAL: that the English definition of the direct mapping be defined as:
>
> [[
> The Direct Mapping is a formula for creating an RDF graph from the
> rows of each table in a database. A base IRI defines a web space for
> the labels in this graph; all labels are generated by appending to the
> base.
>
> The functions scalar and reference extract the scalar and reference
> attributes (those participating in a foreign key) respectively:
>
> dfn scalars: the attributes in a table which are NOT in any foreign
>   key.
>
> dfn references: the attributes in a table's foreign keys.
>
> dfn non-unary references: the references for which the table's
>   foreign key is NOT composed of a single attribute.
>
> SQL table and attribute identifiers compose RDF IRIs in the direct
> graph. These identifiers are separated by the punctuation characters
> '#', ',', '/' and '='. All SQL identifiers are escaped following URL-
> encoding
> <
> http://www.w3.org/TR/html5/association-of-controls-and-forms.html#url-encoded-form-data
> >
> except that only the above punctuation and the characters not
> permitted in RDF IRIs are escaped.
>
> In the direct graph, there is an identifier for each row in a database
> table. If the row is in a table with a primary key, this is formed
> from the table name and the attribute names and values of each attribute
> in the primary key. If there is no primary key for the table, the row
> identifier is a fresh blank node:
>
> dfn row identifier:
>
>   if the table has a primary key with attributes, the relative IRI for
>   the row identifier is the concatenation of the table name, '/', and
>   a ','-separated concatenation of each attribute name, '=', and the
>   attribute value.
>
>   if the table has no primary key, the row identifier is a fresh blank
>   node.
>
> A (potentially unary) list of attribute names in a table form a
> property IRI:
>
> dfn property IRI: the concationation of the table name, '/', and a
>   ','-separated concatonation of each attribute name, and a '#' at
>   the end of the property IRI.
>
> The values in a row are mapped to RDF literals:
>
> dfn literal map: a mapping from an SQL value with a datatype to an RDF
>    literal with and XML Schema datatype where the RDF literal has a
>   lexical value equivalent to the SQL lexical value and the datatype
>   mapping is found in this table:
>
> SQL     XSD datatype
> ___     ____________
> INT     http://www.w3.org/TR/xmlschema-2/#integer
> FLOAT   http://www.w3.org/TR/xmlschema-2/#float
> DATE    http://www.w3.org/TR/xmlschema-2/#date
> TIME    http://www.w3.org/TR/xmlschema-2/#time
> TIMESTAMP       http://www.w3.org/TR/xmlschema-2/#dateTime
> CHAR    plain literal
> VARCHAR plain literal
> STRING  plain literal
>
> The Direct Mapping is defined by a set of mapping functions from table
> rows to RDF triples:
>
> dfn direct mapping: the set of RDF triples produced by invoking the
>   <table mapping> on each table in a database.
>
> dfn table mapping: the set of RDF triples created by invoking the
>   <row mapping> on each row in a table.
>
> dfn row mapping: using a row identifier S for the row,
>  the type triple:
>    (S, rdf:type, <table type>)
>  plus the scalar triples:
>    for each attribute in the list of <scalars> where the attribute
>      value is non-NULL:
>      (S,
>       the <property IRI> for the attribute,
>       the <literal map> for the attribute value).
>  plus the reference triples:
>    for each list of attributes in the <non-unary references> where none
>      of the attribute values are NULL:
>      (S,
>       the <property IRI> for the attributes,
>       the <row identifier> for the referenced triple)
> ]]
>
>
> > Juan Sequeda
> > +1-575-SEQ-UEDA
> > www.juansequeda.com
> >
> >
> > On Tue, Aug 2, 2011 at 6:44 AM, Juan Sequeda <juanfederico@gmail.com>
> wrote:
> >
> > > Eric,
> > >
> > > This is great. I was planning to write up a proposal myself, but you
> saved
> > > my time. I do have some comments and suggestions. I'm writing up a new
> > > proposal based on what you have. I should have it done before the
> meeting
> > >
> > > Juan Sequeda
> > > www.juansequeda.com
> > >
> > > On Aug 2, 2011, at 2:01 AM, Michael Hausenblas <
> > > michael.hausenblas@deri.org> wrote:
> > >
> > > >
> > > > Eric,
> > > >
> > > >> PROPOSAL: that the English definition of the direct mapping be
> defined
> > > as:
> > > >> [[
> > > >> The Direct Mapping is a formula for creating an RDF graph from the
> > > >> rows in a table. A base IRI defines a web space for the labels in
> > > >
> > > > ...
> > > >
> > > > Thanks a lot for this proposal, Eric! I'm wondering if we're ready to
> > > resolve this today or if the WG feels that we need to discuss a bit
> more. In
> > > any case I'm flexible to change today's agenda [1] if the WG thinks it
> makes
> > > sense ...
> > > >
> > > > Cheers,
> > > >    Michael
> > > >
> > > > [1]
> > >
> http://lists.w3.org/Archives/Public/public-rdb2rdf-wg/2011Jul/0183.html
> > > > --
> > > > Dr. Michael Hausenblas, Research Fellow
> > > > LiDRC - Linked Data Research Centre
> > > > DERI - Digital Enterprise Research Institute
> > > > NUIG - National University of Ireland, Galway
> > > > Ireland, Europe
> > > > Tel. +353 91 495730
> > > > http://linkeddata.deri.ie/
> > > > http://sw-app.org/about.html
> > > >
> > > > On 2 Aug 2011, at 00:22, Eric Prud'hommeaux wrote:
> > > >
> > > >> * Richard Cyganiak <richard@cyganiak.de> [2011-07-26 19:41+0100]
> > > >>> Hi all,
> > > >>>
> > > >>> The Direct Mapping document is stuck because we have a stalemate
> > > between the editors. With Last Call approaching, we need *some* way of
> > > breaking the stalemate. So here's a proposal. This is a possible new
> outline
> > > for the document, along with assignments of separate sections to
> separate
> > > editors.
> > > >>>
> > > >>>
> > > >>>   1. Introduction
> > > >>>      - What is this?
> > > >>>      - How does it relate to R2RML
> > > >>>      - Target audience, assumed level of knowledge
> > > >>>      - RDF terms and SQL/relational terms are used as defined in
> > > >>>        documents XXX and YYY
> > > >>>
> > > >>>   2. Example (Informative)
> > > >>>      - A simple two-table example
> > > >>>      - Quick explanation of foreign key handling
> > > >>>      - Quick explanation of tables w/o PKs
> > > >>>
> > > >>>   3. The Direct Mapping [in Plain English]
> > > >>>      - “The Direct Graph of a database is the union of the Table
> Graphs
> > > >>>         of all tables in the database.”
> > > >>>      - “The Table Graph of a table is the union of the Row
> Graphs...”
> > > >>>      - “The Row Graph of a row is ...”
> > > >>>      - ...
> > > >>
> > > >> PROPOSAL: that the English definition of the direct mapping be
> defined
> > > as:
> > > >> [[
> > > >> The Direct Mapping is a formula for creating an RDF graph from the
> > > >> rows in a table. A base IRI defines a web space for the labels in
> > > >> this graph; all labels are generated by appending to the base.
> > > >>
> > > >> The functions scalar and reference extract the scalar and reference
> > > >> attributes (those participating in a foreign key) respectively:
> > > >>
> > > >> dfn references: the attributes in a table's foreign keys.
> > > >>
> > > >> dfn scalars: the attributes in a table which are NOT in any foreign
> > > >>  key.
> > > >>
> > > >> dfn: non-unary references: the references for which the table's
> > > >>  foreign key is NOT composed of a single attribute.
> > > >>
> > > >> SQL table and attribute identifiers compose RDF IRIs in the direct
> > > >> graph. These identifiers are separated by the punctuation characters
> > > >> '#', ',', '/' and '='. All SQL identifiers are escaped following
> URL-
> > > >> encoding
> > > >> <
> > >
> http://www.w3.org/TR/html5/association-of-controls-and-forms.html#url-encoded-form-data
> > > >
> > > >> except that only the above punctuation and the characters not
> > > >> permitted in RDF IRIs are escaped.
> > > >>
> > > >> In the direct graph, there is an identifier for each row in a
> database
> > > >> table. If the row is in a table with a primary key, this is formed
> > > >> from the table name and the attribute names and values of each
> attribute
> > > >> in the primary key. If there is no primary key for the table, the
> row
> > > >> identifier is a fresh blank node:
> > > >>
> > > >> dfn row identifier:
> > > >>
> > > >>  if the table has a primary key with attributes, the relative IRI
> for
> > > >>  the row identifier is the concationation of the table name, '/',
> and
> > > >>  a ','-separated concatonation of each attribute name, '=', and the
> > > >>  attribute value.
> > > >>
> > > >>  if the table has no primary key, the row identifier is a fresh
> blank
> > > >>  node.
> > > >>
> > > >> A (potentially unary) list of attribute names in a table form a
> > > >> property IRI:
> > > >>
> > > >> dfn property IRI: the concationation of the table name, '/', and a
> > > >>  ','-separated concatonation of each attribute name, and a '#' at
> > > >>  the end of the property IRI.
> > > >>
> > > >> The values in a row are mapped to RDF literals:
> > > >>
> > > >> dfn litaral map: a mapping from an SQL value with a datatype to an
> RDF
> > > >>  literal with and XML Schema datatype where the RDF literal has a
> > > >>  lexical value equivalent to the SQL lexical value and the datatype
> > > >>  mapping is found in this table:
> > > >>
> > > >> SQL      XSD datatype
> > > >> ___     ____________
> > > >> INT    http://www.w3.org/TR/xmlschema-2/#integer
> > > >> FLOAT    http://www.w3.org/TR/xmlschema-2/#float
> > > >> DATE    http://www.w3.org/TR/xmlschema-2/#date
> > > >> TIME    http://www.w3.org/TR/xmlschema-2/#time
> > > >> TIMESTAMP    http://www.w3.org/TR/xmlschema-2/#dateTime
> > > >> CHAR    plain literal
> > > >> VARCHAR plain literal
> > > >> STRING    plain literal
> > > >>
> > > >> The Direct Maping is defined by a set of mapping functions from
> table
> > > >> rows to RDF triples:
> > > >>
> > > >> dfn direct mapping: the set of triples produced by invoking the
> > > >>  <table mapping> on each table in a database.
> > > >>
> > > >> dfn table mapping: the set of RDF triples created by invoking the
> > > >>  <row mapping> on each row in a table.
> > > >>
> > > >> dfn row mapping: using a row identifier S for the row,
> > > >> the type triple:
> > > >>   (S, rdf:type, <table type>)
> > > >> plus the scalar triples:
> > > >>   for each attribute in the list of <scalars> where the attribute
> > > >>     value is non-NULL:
> > > >>     (S,
> > > >>      the <property IRI> for the attribute,
> > > >>      the <literal map> for the attribute value).
> > > >> plus the reference triples:
> > > >>   for each list of attributes in the <non-unary references> where
> none
> > > >>     of the attribute values are NULL:
> > > >>     (S,
> > > >>      the <property IRI> for the attributes,
> > > >>      the <row identifier> for the referenced triple)
> > > >> ]]
> > > >>
> > > >>>   A. Appendix: Formalisms (Informative)
> > > >>>      - should be crisp, short, precise, with only minimum
> explanation
> > > >>>        and examples
> > > >>>      A.1 Datalog Rules
> > > >>>      A.2 Denotational Semantics
> > > >>>      A.3 Set-Style Direct Mapping
> > > >>>
> > > >>>   B. Acknowledgements (Informative)
> > > >>>
> > > >>>   C. References
> > > >>>
> > > >>>
> > > >>> I see Juan and Marcelo editing A.1.
> > > >>>
> > > >>> I see Alexandre editing A.2.
> > > >>>
> > > >>> I see Eric editing 2 (which he already wrote), 3 (which *mostly*
> > > exists), and A.3.
> > > >>>
> > > >>> I don't know about 1, B, and C.
> > > >>>
> > > >>> My reasoning is that there is no objective way of picking any of
> the
> > > formalisms over another formalism, so the normative expression should
> be the
> > > lowest common denominator: plain English. By making the formalisms all
> > > informative, we free them from the burden of having to explain the
> direct
> > > mapping itself in a generally accessible way. The focus can be totally
> on
> > > presenting the formalisms in all their terseness to an audience that is
> > > familiar with datalog/denotational semantics/whatever.
> > > >>>
> > > >>> I hope this proposal aids discussion.
> > > >>>
> > > >>> Best,
> > > >>> Richard
> > > >>
> > > >> --
> > > >> -ericP
> > > >>
> > > >
> > > >
> > >
>
> --
> -ericP
>

Received on Tuesday, 2 August 2011 14:25:50 UTC