Re: Proposal for the Direct Mapping

Eric,

This is great. I was planning to write up a proposal myself, but you saved my time. I do have some comments and suggestions. I'm writing up a new proposal based on what you have. I should have it done before the meeting

Juan Sequeda
www.juansequeda.com

On Aug 2, 2011, at 2:01 AM, Michael Hausenblas <michael.hausenblas@deri.org> wrote:

> 
> Eric,
> 
>> PROPOSAL: that the English definition of the direct mapping be defined as:
>> [[
>> The Direct Mapping is a formula for creating an RDF graph from the
>> rows in a table. A base IRI defines a web space for the labels in
> 
> ...
> 
> Thanks a lot for this proposal, Eric! I'm wondering if we're ready to resolve this today or if the WG feels that we need to discuss a bit more. In any case I'm flexible to change today's agenda [1] if the WG thinks it makes sense ...
> 
> Cheers,
>    Michael
> 
> [1] http://lists.w3.org/Archives/Public/public-rdb2rdf-wg/2011Jul/0183.html
> --
> Dr. Michael Hausenblas, Research Fellow
> LiDRC - Linked Data Research Centre
> DERI - Digital Enterprise Research Institute
> NUIG - National University of Ireland, Galway
> Ireland, Europe
> Tel. +353 91 495730
> http://linkeddata.deri.ie/
> http://sw-app.org/about.html
> 
> On 2 Aug 2011, at 00:22, Eric Prud'hommeaux wrote:
> 
>> * Richard Cyganiak <richard@cyganiak.de> [2011-07-26 19:41+0100]
>>> Hi all,
>>> 
>>> The Direct Mapping document is stuck because we have a stalemate between the editors. With Last Call approaching, we need *some* way of breaking the stalemate. So here's a proposal. This is a possible new outline for the document, along with assignments of separate sections to separate editors.
>>> 
>>> 
>>>   1. Introduction
>>>      - What is this?
>>>      - How does it relate to R2RML
>>>      - Target audience, assumed level of knowledge
>>>      - RDF terms and SQL/relational terms are used as defined in
>>>        documents XXX and YYY
>>> 
>>>   2. Example (Informative)
>>>      - A simple two-table example
>>>      - Quick explanation of foreign key handling
>>>      - Quick explanation of tables w/o PKs
>>> 
>>>   3. The Direct Mapping [in Plain English]
>>>      - “The Direct Graph of a database is the union of the Table Graphs
>>>         of all tables in the database.”
>>>      - “The Table Graph of a table is the union of the Row Graphs...”
>>>      - “The Row Graph of a row is ...”
>>>      - ...
>> 
>> PROPOSAL: that the English definition of the direct mapping be defined as:
>> [[
>> The Direct Mapping is a formula for creating an RDF graph from the
>> rows in a table. A base IRI defines a web space for the labels in
>> this graph; all labels are generated by appending to the base.
>> 
>> The functions scalar and reference extract the scalar and reference
>> attributes (those participating in a foreign key) respectively:
>> 
>> dfn references: the attributes in a table's foreign keys.
>> 
>> dfn scalars: the attributes in a table which are NOT in any foreign
>>  key.
>> 
>> dfn: non-unary references: the references for which the table's
>>  foreign key is NOT composed of a single attribute.
>> 
>> SQL table and attribute identifiers compose RDF IRIs in the direct
>> graph. These identifiers are separated by the punctuation characters
>> '#', ',', '/' and '='. All SQL identifiers are escaped following URL-
>> encoding
>> <http://www.w3.org/TR/html5/association-of-controls-and-forms.html#url-encoded-form-data>
>> except that only the above punctuation and the characters not
>> permitted in RDF IRIs are escaped.
>> 
>> In the direct graph, there is an identifier for each row in a database
>> table. If the row is in a table with a primary key, this is formed
>> from the table name and the attribute names and values of each attribute
>> in the primary key. If there is no primary key for the table, the row
>> identifier is a fresh blank node:
>> 
>> dfn row identifier:
>> 
>>  if the table has a primary key with attributes, the relative IRI for
>>  the row identifier is the concationation of the table name, '/', and
>>  a ','-separated concatonation of each attribute name, '=', and the
>>  attribute value.
>> 
>>  if the table has no primary key, the row identifier is a fresh blank
>>  node.
>> 
>> A (potentially unary) list of attribute names in a table form a
>> property IRI:
>> 
>> dfn property IRI: the concationation of the table name, '/', and a
>>  ','-separated concatonation of each attribute name, and a '#' at
>>  the end of the property IRI.
>> 
>> The values in a row are mapped to RDF literals:
>> 
>> dfn litaral map: a mapping from an SQL value with a datatype to an RDF
>>  literal with and XML Schema datatype where the RDF literal has a
>>  lexical value equivalent to the SQL lexical value and the datatype
>>  mapping is found in this table:
>> 
>> SQL      XSD datatype
>> ___     ____________
>> INT    http://www.w3.org/TR/xmlschema-2/#integer
>> FLOAT    http://www.w3.org/TR/xmlschema-2/#float
>> DATE    http://www.w3.org/TR/xmlschema-2/#date
>> TIME    http://www.w3.org/TR/xmlschema-2/#time
>> TIMESTAMP    http://www.w3.org/TR/xmlschema-2/#dateTime
>> CHAR    plain literal
>> VARCHAR plain literal
>> STRING    plain literal
>> 
>> The Direct Maping is defined by a set of mapping functions from table
>> rows to RDF triples:
>> 
>> dfn direct mapping: the set of triples produced by invoking the
>>  <table mapping> on each table in a database.
>> 
>> dfn table mapping: the set of RDF triples created by invoking the
>>  <row mapping> on each row in a table.
>> 
>> dfn row mapping: using a row identifier S for the row,
>> the type triple:
>>   (S, rdf:type, <table type>)
>> plus the scalar triples:
>>   for each attribute in the list of <scalars> where the attribute
>>     value is non-NULL:
>>     (S,
>>      the <property IRI> for the attribute,
>>      the <literal map> for the attribute value).
>> plus the reference triples:
>>   for each list of attributes in the <non-unary references> where none
>>     of the attribute values are NULL:
>>     (S,
>>      the <property IRI> for the attributes,
>>      the <row identifier> for the referenced triple)
>> ]]
>> 
>>>   A. Appendix: Formalisms (Informative)
>>>      - should be crisp, short, precise, with only minimum explanation
>>>        and examples
>>>      A.1 Datalog Rules
>>>      A.2 Denotational Semantics
>>>      A.3 Set-Style Direct Mapping
>>> 
>>>   B. Acknowledgements (Informative)
>>> 
>>>   C. References
>>> 
>>> 
>>> I see Juan and Marcelo editing A.1.
>>> 
>>> I see Alexandre editing A.2.
>>> 
>>> I see Eric editing 2 (which he already wrote), 3 (which *mostly* exists), and A.3.
>>> 
>>> I don't know about 1, B, and C.
>>> 
>>> My reasoning is that there is no objective way of picking any of the formalisms over another formalism, so the normative expression should be the lowest common denominator: plain English. By making the formalisms all informative, we free them from the burden of having to explain the direct mapping itself in a generally accessible way. The focus can be totally on presenting the formalisms in all their terseness to an audience that is familiar with datalog/denotational semantics/whatever.
>>> 
>>> I hope this proposal aids discussion.
>>> 
>>> Best,
>>> Richard
>> 
>> -- 
>> -ericP
>> 
> 
> 

Received on Tuesday, 2 August 2011 11:44:39 UTC