Re: Proposal for the Direct Mapping

Eric, all

This is my proposal. Just a few changes, and added subsections.

   PROPOSAL: that the English definition of the direct mapping be defined
as:

[[

Section 3: The Direct Mapping


The Direct Mapping is a formula for creating an RDF graph from the rows of
all the tables in a database.


A base IRI defines a web space for the labels in this graph; all labels are
generated by appending to the base.


The functions scalar and reference extract the scalar and reference
attributes (those participating in a foreign key) respectively:


dfn scalars: the attributes in a table which are NOT in any foreign key.


dfn references: the attributes in a table's foreign keys.


dfn non-unary references: the references for which the table's foreign key
is NOT composed of a single attribute.


Section 3.1: Generating Row Identifiers


SQL table and attribute identifiers compose RDF IRIs in the direct graph.
These identifiers are separated by the punctuation characters '#', ',', '/'
and '='. All SQL identifiers are escaped following URL-encoding

<
http://www.w3.org/TR/html5/association-of-controls-and-forms.html#url-encoded-form-data
>

except that only the above punctuation and the characters not permitted in
RDF IRIs are escaped.


In the direct graph, there is an identifier for each row in a database
table. If the row is in a table with a primary key, this is formed from the
table name and the attribute names and values of each attribute in the
primary key. If there is no primary key for the table, the row identifier is
a fresh blank node:


dfn row identifier:


  if the table has a primary key with attributes, the relative IRI for

  the row identifier is the concatenation of the table name, '/', and

  a ','-separated concatenation of each attribute name, '=', and the

  attribute value.


  if the table has no primary key, the row identifier is a fresh blank

  node.


A (potentially unary) list of attribute names in a table form a

property IRI:


dfn property IRI: the concationation of the table name, '/', and a

  ','-separated concatonation of each attribute name, and a '#' at

  the end of the property IRI.


Section 3.2: Mapping database values to RDF Literals


The values in a row are mapped to RDF literals:


dfn literal map: a mapping from a SQL value with a datatype to an RDF

  literal with and XML Schema datatype where the RDF literal has a

  lexical value equivalent to the SQL lexical value and the datatype

  mapping is found in this table:


SQL     XSD datatype

___     ____________

INT     http://www.w3.org/TR/xmlschema-2/#integer

FLOAT   http://www.w3.org/TR/xmlschema-2/#float

DATE    http://www.w3.org/TR/xmlschema-2/#date

TIME    http://www.w3.org/TR/xmlschema-2/#time

TIMESTAMP       http://www.w3.org/TR/xmlschema-2/#dateTime

CHAR    plain literal

VARCHAR plain literal

STRING  plain literal


Section 3.3: Generating RDF Triples


The Direct Mapping is defined by a set of mapping functions from table

rows to RDF triples:


dfn direct mapping: the set of RDF triples produced by invoking the <table
mapping> on each table in a database.


dfn table mapping: the set of RDF triples created by invoking the <row
mapping> on each row in a table.


dfn row mapping: using a Row Identifier S for each row,

 the type triple:

   (S, rdf:type, <table type>)

 plus the scalar triples:

   for each attribute in the list of <scalars> where the attribute value is
non-NULL:

     (S, the <property IRI> for the attribute, the <literal map> for the
attribute value).

 plus the reference triples:

   for each list of attributes in the <non-unary references> where none of
the attribute values are NULL:

     (S, the <property IRI> for the attributes, the <row identifier> for the
referenced triple)

]]

Juan Sequeda
+1-575-SEQ-UEDA
www.juansequeda.com


On Tue, Aug 2, 2011 at 6:44 AM, Juan Sequeda <juanfederico@gmail.com> wrote:

> Eric,
>
> This is great. I was planning to write up a proposal myself, but you saved
> my time. I do have some comments and suggestions. I'm writing up a new
> proposal based on what you have. I should have it done before the meeting
>
> Juan Sequeda
> www.juansequeda.com
>
> On Aug 2, 2011, at 2:01 AM, Michael Hausenblas <
> michael.hausenblas@deri.org> wrote:
>
> >
> > Eric,
> >
> >> PROPOSAL: that the English definition of the direct mapping be defined
> as:
> >> [[
> >> The Direct Mapping is a formula for creating an RDF graph from the
> >> rows in a table. A base IRI defines a web space for the labels in
> >
> > ...
> >
> > Thanks a lot for this proposal, Eric! I'm wondering if we're ready to
> resolve this today or if the WG feels that we need to discuss a bit more. In
> any case I'm flexible to change today's agenda [1] if the WG thinks it makes
> sense ...
> >
> > Cheers,
> >    Michael
> >
> > [1]
> http://lists.w3.org/Archives/Public/public-rdb2rdf-wg/2011Jul/0183.html
> > --
> > Dr. Michael Hausenblas, Research Fellow
> > LiDRC - Linked Data Research Centre
> > DERI - Digital Enterprise Research Institute
> > NUIG - National University of Ireland, Galway
> > Ireland, Europe
> > Tel. +353 91 495730
> > http://linkeddata.deri.ie/
> > http://sw-app.org/about.html
> >
> > On 2 Aug 2011, at 00:22, Eric Prud'hommeaux wrote:
> >
> >> * Richard Cyganiak <richard@cyganiak.de> [2011-07-26 19:41+0100]
> >>> Hi all,
> >>>
> >>> The Direct Mapping document is stuck because we have a stalemate
> between the editors. With Last Call approaching, we need *some* way of
> breaking the stalemate. So here's a proposal. This is a possible new outline
> for the document, along with assignments of separate sections to separate
> editors.
> >>>
> >>>
> >>>   1. Introduction
> >>>      - What is this?
> >>>      - How does it relate to R2RML
> >>>      - Target audience, assumed level of knowledge
> >>>      - RDF terms and SQL/relational terms are used as defined in
> >>>        documents XXX and YYY
> >>>
> >>>   2. Example (Informative)
> >>>      - A simple two-table example
> >>>      - Quick explanation of foreign key handling
> >>>      - Quick explanation of tables w/o PKs
> >>>
> >>>   3. The Direct Mapping [in Plain English]
> >>>      - “The Direct Graph of a database is the union of the Table Graphs
> >>>         of all tables in the database.”
> >>>      - “The Table Graph of a table is the union of the Row Graphs...”
> >>>      - “The Row Graph of a row is ...”
> >>>      - ...
> >>
> >> PROPOSAL: that the English definition of the direct mapping be defined
> as:
> >> [[
> >> The Direct Mapping is a formula for creating an RDF graph from the
> >> rows in a table. A base IRI defines a web space for the labels in
> >> this graph; all labels are generated by appending to the base.
> >>
> >> The functions scalar and reference extract the scalar and reference
> >> attributes (those participating in a foreign key) respectively:
> >>
> >> dfn references: the attributes in a table's foreign keys.
> >>
> >> dfn scalars: the attributes in a table which are NOT in any foreign
> >>  key.
> >>
> >> dfn: non-unary references: the references for which the table's
> >>  foreign key is NOT composed of a single attribute.
> >>
> >> SQL table and attribute identifiers compose RDF IRIs in the direct
> >> graph. These identifiers are separated by the punctuation characters
> >> '#', ',', '/' and '='. All SQL identifiers are escaped following URL-
> >> encoding
> >> <
> http://www.w3.org/TR/html5/association-of-controls-and-forms.html#url-encoded-form-data
> >
> >> except that only the above punctuation and the characters not
> >> permitted in RDF IRIs are escaped.
> >>
> >> In the direct graph, there is an identifier for each row in a database
> >> table. If the row is in a table with a primary key, this is formed
> >> from the table name and the attribute names and values of each attribute
> >> in the primary key. If there is no primary key for the table, the row
> >> identifier is a fresh blank node:
> >>
> >> dfn row identifier:
> >>
> >>  if the table has a primary key with attributes, the relative IRI for
> >>  the row identifier is the concationation of the table name, '/', and
> >>  a ','-separated concatonation of each attribute name, '=', and the
> >>  attribute value.
> >>
> >>  if the table has no primary key, the row identifier is a fresh blank
> >>  node.
> >>
> >> A (potentially unary) list of attribute names in a table form a
> >> property IRI:
> >>
> >> dfn property IRI: the concationation of the table name, '/', and a
> >>  ','-separated concatonation of each attribute name, and a '#' at
> >>  the end of the property IRI.
> >>
> >> The values in a row are mapped to RDF literals:
> >>
> >> dfn litaral map: a mapping from an SQL value with a datatype to an RDF
> >>  literal with and XML Schema datatype where the RDF literal has a
> >>  lexical value equivalent to the SQL lexical value and the datatype
> >>  mapping is found in this table:
> >>
> >> SQL      XSD datatype
> >> ___     ____________
> >> INT    http://www.w3.org/TR/xmlschema-2/#integer
> >> FLOAT    http://www.w3.org/TR/xmlschema-2/#float
> >> DATE    http://www.w3.org/TR/xmlschema-2/#date
> >> TIME    http://www.w3.org/TR/xmlschema-2/#time
> >> TIMESTAMP    http://www.w3.org/TR/xmlschema-2/#dateTime
> >> CHAR    plain literal
> >> VARCHAR plain literal
> >> STRING    plain literal
> >>
> >> The Direct Maping is defined by a set of mapping functions from table
> >> rows to RDF triples:
> >>
> >> dfn direct mapping: the set of triples produced by invoking the
> >>  <table mapping> on each table in a database.
> >>
> >> dfn table mapping: the set of RDF triples created by invoking the
> >>  <row mapping> on each row in a table.
> >>
> >> dfn row mapping: using a row identifier S for the row,
> >> the type triple:
> >>   (S, rdf:type, <table type>)
> >> plus the scalar triples:
> >>   for each attribute in the list of <scalars> where the attribute
> >>     value is non-NULL:
> >>     (S,
> >>      the <property IRI> for the attribute,
> >>      the <literal map> for the attribute value).
> >> plus the reference triples:
> >>   for each list of attributes in the <non-unary references> where none
> >>     of the attribute values are NULL:
> >>     (S,
> >>      the <property IRI> for the attributes,
> >>      the <row identifier> for the referenced triple)
> >> ]]
> >>
> >>>   A. Appendix: Formalisms (Informative)
> >>>      - should be crisp, short, precise, with only minimum explanation
> >>>        and examples
> >>>      A.1 Datalog Rules
> >>>      A.2 Denotational Semantics
> >>>      A.3 Set-Style Direct Mapping
> >>>
> >>>   B. Acknowledgements (Informative)
> >>>
> >>>   C. References
> >>>
> >>>
> >>> I see Juan and Marcelo editing A.1.
> >>>
> >>> I see Alexandre editing A.2.
> >>>
> >>> I see Eric editing 2 (which he already wrote), 3 (which *mostly*
> exists), and A.3.
> >>>
> >>> I don't know about 1, B, and C.
> >>>
> >>> My reasoning is that there is no objective way of picking any of the
> formalisms over another formalism, so the normative expression should be the
> lowest common denominator: plain English. By making the formalisms all
> informative, we free them from the burden of having to explain the direct
> mapping itself in a generally accessible way. The focus can be totally on
> presenting the formalisms in all their terseness to an audience that is
> familiar with datalog/denotational semantics/whatever.
> >>>
> >>> I hope this proposal aids discussion.
> >>>
> >>> Best,
> >>> Richard
> >>
> >> --
> >> -ericP
> >>
> >
> >
>

Received on Tuesday, 2 August 2011 13:05:51 UTC