Re: Proposal for the Direct Mapping

* Richard Cyganiak <richard@cyganiak.de> [2011-07-26 19:41+0100]
> Hi all,
> 
> The Direct Mapping document is stuck because we have a stalemate between the editors. With Last Call approaching, we need *some* way of breaking the stalemate. So here's a proposal. This is a possible new outline for the document, along with assignments of separate sections to separate editors.
> 
> 
>     1. Introduction
>        - What is this?
>        - How does it relate to R2RML
>        - Target audience, assumed level of knowledge
>        - RDF terms and SQL/relational terms are used as defined in 
>          documents XXX and YYY
> 
>     2. Example (Informative)
>        - A simple two-table example
>        - Quick explanation of foreign key handling
>        - Quick explanation of tables w/o PKs
> 
>     3. The Direct Mapping [in Plain English]
>        - “The Direct Graph of a database is the union of the Table Graphs
>           of all tables in the database.”
>        - “The Table Graph of a table is the union of the Row Graphs...”
>        - “The Row Graph of a row is ...”
>        - ...

PROPOSAL: that the English definition of the direct mapping be defined as:
[[
The Direct Mapping is a formula for creating an RDF graph from the
rows in a table. A base IRI defines a web space for the labels in
this graph; all labels are generated by appending to the base.

The functions scalar and reference extract the scalar and reference
attributes (those participating in a foreign key) respectively:

dfn references: the attributes in a table's foreign keys.

dfn scalars: the attributes in a table which are NOT in any foreign
   key.

dfn: non-unary references: the references for which the table's
   foreign key is NOT composed of a single attribute.

SQL table and attribute identifiers compose RDF IRIs in the direct
graph. These identifiers are separated by the punctuation characters
'#', ',', '/' and '='. All SQL identifiers are escaped following URL-
encoding
<http://www.w3.org/TR/html5/association-of-controls-and-forms.html#url-encoded-form-data>
except that only the above punctuation and the characters not
permitted in RDF IRIs are escaped.

In the direct graph, there is an identifier for each row in a database
table. If the row is in a table with a primary key, this is formed
from the table name and the attribute names and values of each attribute
in the primary key. If there is no primary key for the table, the row
identifier is a fresh blank node:

dfn row identifier:

   if the table has a primary key with attributes, the relative IRI for
   the row identifier is the concationation of the table name, '/', and
   a ','-separated concatonation of each attribute name, '=', and the
   attribute value.

   if the table has no primary key, the row identifier is a fresh blank
   node.

A (potentially unary) list of attribute names in a table form a
property IRI:

dfn property IRI: the concationation of the table name, '/', and a
   ','-separated concatonation of each attribute name, and a '#' at
   the end of the property IRI.

The values in a row are mapped to RDF literals:

dfn litaral map: a mapping from an SQL value with a datatype to an RDF
   literal with and XML Schema datatype where the RDF literal has a
   lexical value equivalent to the SQL lexical value and the datatype
   mapping is found in this table:

SQL   XSD datatype
___     ____________
INT  http://www.w3.org/TR/xmlschema-2/#integer
FLOAT  http://www.w3.org/TR/xmlschema-2/#float
DATE  http://www.w3.org/TR/xmlschema-2/#date
TIME  http://www.w3.org/TR/xmlschema-2/#time
TIMESTAMP  http://www.w3.org/TR/xmlschema-2/#dateTime
CHAR  plain literal
VARCHAR plain literal
STRING  plain literal

The Direct Maping is defined by a set of mapping functions from table
rows to RDF triples:

dfn direct mapping: the set of triples produced by invoking the
   <table mapping> on each table in a database.

dfn table mapping: the set of RDF triples created by invoking the
   <row mapping> on each row in a table.

dfn row mapping: using a row identifier S for the row,
  the type triple:
    (S, rdf:type, <table type>)
  plus the scalar triples:
    for each attribute in the list of <scalars> where the attribute
      value is non-NULL:
      (S,
       the <property IRI> for the attribute,
       the <literal map> for the attribute value).
  plus the reference triples:
    for each list of attributes in the <non-unary references> where none
      of the attribute values are NULL:
      (S,
       the <property IRI> for the attributes,
       the <row identifier> for the referenced triple)
]]

>     A. Appendix: Formalisms (Informative)
>        - should be crisp, short, precise, with only minimum explanation
>          and examples
>        A.1 Datalog Rules
>        A.2 Denotational Semantics
>        A.3 Set-Style Direct Mapping
> 
>     B. Acknowledgements (Informative)
> 
>     C. References
> 
> 
> I see Juan and Marcelo editing A.1.
> 
> I see Alexandre editing A.2.
> 
> I see Eric editing 2 (which he already wrote), 3 (which *mostly* exists), and A.3.
> 
> I don't know about 1, B, and C.
> 
> My reasoning is that there is no objective way of picking any of the formalisms over another formalism, so the normative expression should be the lowest common denominator: plain English. By making the formalisms all informative, we free them from the burden of having to explain the direct mapping itself in a generally accessible way. The focus can be totally on presenting the formalisms in all their terseness to an audience that is familiar with datalog/denotational semantics/whatever.
> 
> I hope this proposal aids discussion.
> 
> Best,
> Richard

-- 
-ericP

Received on Monday, 1 August 2011 23:22:47 UTC