Re: Proposal for the Direct Mapping

On 2 Aug 2011, at 15:19, Eric Prud'hommeaux wrote:
>  • DM is for "all the tables in a database"
>    I debated this; I didn't want to be alarm folks who would think
>    they'd have to expose everything if they didn't want to. The
>    alternative is to parametrize; neither is terribly attractive. I
>    guess "all tables" is fine.

"all tables and views in the schema"?

>  • s/an SQL/a SQL/
>    This depends on whether you call it "S Q L" or "sequal". The SQL
>    spec uses "an", e.g. "Effects of SQL-statements in an SQL-transaction".

Ah, interesting point. R2RML uses “a SQL” but that's just my personal preference. I guess the spec should be considered authoritative on this.

> [[
> The Direct Mapping is a formula for creating an RDF graph from the
> rows of each table in a database. A base IRI defines a web space for
> the labels in this graph; all labels are generated by appending to the
> base.

There are no “labels” in an RDF graph. Let's please stick to the standard terminology from the specs.

> The functions scalar and reference extract the scalar and reference
> attributes (those participating in a foreign key) respectively:

Why does this have to be formulated as “functions”?

> dfn scalars: the attributes in a table which are NOT in any foreign
>   key.

How about: The non-foreign key columns of a table are the columns which are not in any foreign key.

> dfn references: the attributes in a table's foreign keys.

How about: The foreign key columns of a table are the columns which are in some foreign key.

> SQL table and attribute identifiers compose RDF IRIs in the direct
> graph. These identifiers are separated by the punctuation characters
> '#', ',', '/' and '='. All SQL identifiers are escaped following URL-
> encoding
> <http://www.w3.org/TR/html5/association-of-controls-and-forms.html#url-encoded-form-data>
> except that only the above punctuation and the characters not
> permitted in RDF IRIs are escaped.

I'd define once: The URL-encoded form of a string is …

And then explicitly state that the so-and-so IRI is the concatentation of base IRI, '/', URL-encoded form of the table name, and so on.

(I recall discussions about using relative IRIs in the direct mapping. It might be easiest to limit that to the examples. “The example omits the base IRI for brevity, and uses relative IRIs. In the actual direct mapping graph, the base IRI would be prepended to all IRIs.”)

> In the direct graph, there is an identifier for each row in a database
> table. If the row is in a table with a primary key, this is formed
> from the table name and the attribute names and values of each attribute
> in the primary key. If there is no primary key for the table, the row
> identifier is a fresh blank node:
> 
> dfn row identifier:
> 
>   if the table has a primary key with attributes, the relative IRI for
>   the row identifier is the concatenation of the table name, '/', and
>   a ','-separated concatenation of each attribute name, '=', and the
>   attribute value.
> 
>   if the table has no primary key, the row identifier is a fresh blank
>   node.

This doesn't need to be repeated twice. I'd call it row IRI for maximum clarity.

> A (potentially unary) list of attribute names in a table form a
> property IRI:
> 
> dfn property IRI: the concationation of the table name, '/', and a
>   ','-separated concatonation of each attribute name, and a '#' at
>   the end of the property IRI.

This doesn't need to be repeated one-and-a-half times.

> The values in a row are mapped to RDF literals:
> 
> dfn literal map: a mapping from an SQL value with a datatype to an RDF
>   literal with and XML Schema datatype where the RDF literal has a
>   lexical value equivalent to the SQL lexical value and the datatype
>   mapping is found in this table:
> 
> SQL  	XSD datatype
> ___     ____________
> INT 	http://www.w3.org/TR/xmlschema-2/#integer
> FLOAT 	http://www.w3.org/TR/xmlschema-2/#float
> DATE 	http://www.w3.org/TR/xmlschema-2/#date
> TIME 	http://www.w3.org/TR/xmlschema-2/#time
> TIMESTAMP 	http://www.w3.org/TR/xmlschema-2/#dateTime
> CHAR 	plain literal
> VARCHAR plain literal
> STRING 	plain literal

This should use the standard SQL 2008 types, including BOOLEAN and BINARY string types. (Probably the Direct Mapping can re-use the outcome of R2RML ISSUE-48 here.)

> The Direct Mapping is defined by a set of mapping functions from table
> rows to RDF triples:
> 
> dfn direct mapping: the set of RDF triples produced by invoking the
>   <table mapping> on each table in a database.

A minor stylistic point but I'd say: The direct mapping graph is the union of the table graphs for each table.

> dfn table mapping: the set of RDF triples created by invoking the
>   <row mapping> on each row in a table.

I'd say, the table graph of a table is the union of the row graphs for each row.

> dfn row mapping: using a row identifier S for the row,
>  the type triple:
>    (S, rdf:type, <table type>)
>  plus the scalar triples:
>    for each attribute in the list of <scalars> where the attribute
>      value is non-NULL:
>      (S,
>       the <property IRI> for the attribute,
>       the <literal map> for the attribute value).
>  plus the reference triples:
>    for each list of attributes in the <non-unary references> where none
>      of the attribute values are NULL:
>      (S,
>       the <property IRI> for the attributes,
>       the <row identifier> for the referenced triple)
> ]]

I'd decompose this a bit: The row graph of a row is a graph consisting of the following triples:
- the row type triple
- a data triple for each non-foreign key column where the data value is non-null
- a reference triple for each foreign key column ...

And then:

The row type triple of a row is an RDF triple with the following components:
- subject: the row IRI of the row
- predicate: rdf:type
- object: the table class IRI of the row's table

et cetera.

I know this might not be politically correct in RDF circles, but again I'll point out this post that I found very helpful when editing R2RML:
http://ln.hixie.ch/?start=1140242962&count=1

Best,
Richard

Received on Tuesday, 2 August 2011 22:33:20 UTC