Re: Proposal for the Direct Mapping from Eric Prud'hommeaux on 2011-08-03 (public-rdb2rdf-wg@w3.org from August 2011)

From: Eric Prud'hommeaux <eric@w3.org>
Date: Wed, 3 Aug 2011 19:46:45 +0200
To: Richard Cyganiak <richard@cyganiak.de>
Cc: Juan Sequeda <juanfederico@gmail.com>, Michael Hausenblas <michael.hausenblas@deri.org>, rdb2RDF WG <public-rdb2rdf-wg@w3.org>
Message-ID: <20110803174641.GK21930@w3.org>
* Richard Cyganiak <richard@cyganiak.de> [2011-08-02 23:32+0100]
> On 2 Aug 2011, at 15:19, Eric Prud'hommeaux wrote:
> >  • DM is for "all the tables in a database"
> >    I debated this; I didn't want to be alarm folks who would think
> >    they'd have to expose everything if they didn't want to. The
> >    alternative is to parametrize; neither is terribly attractive. I
> >    guess "all tables" is fine.
> 
> "all tables and views in the schema"?

"each table and view in a database schema"?
done in two places (here and the definition below).

> >  • s/an SQL/a SQL/
> >    This depends on whether you call it "S Q L" or "sequal". The SQL
> >    spec uses "an", e.g. "Effects of SQL-statements in an SQL-transaction".
> 
> Ah, interesting point. R2RML uses “a SQL” but that's just my personal preference. I guess the spec should be considered authoritative on this.
> 
> > [[
> > The Direct Mapping is a formula for creating an RDF graph from the
> > rows of each table in a database. A base IRI defines a web space for
> > the labels in this graph; all labels are generated by appending to the
> > base.
> 
> There are no “labels” in an RDF graph. Let's please stick to the standard terminology from the specs.

done
also s/attribute/column/ # ignoring the question of "fields"

> > The functions scalar and reference extract the scalar and reference
> > attributes (those participating in a foreign key) respectively:
> 
> Why does this have to be formulated as “functions”?

Is there a more intuitive way to say that there's an exact mapping from the input onto the outputs?
And isn't that exactly what an implementor wants to know?

> > dfn scalars: the attributes in a table which are NOT in any foreign
> >   key.
> 
> How about: The non-foreign key columns of a table are the columns which are not in any foreign key.

Looking at it in-situ <http://www.w3.org/2001/sw/rdb2rdf/directMapping/EGP#defn-scalars>, I'm not convinced that the "defintion X: X is..." redundancy will be helpful.

> > dfn references: the attributes in a table's foreign keys.
> 
> How about: The foreign key columns of a table are the columns which are in some foreign key.

ditto

> > SQL table and attribute identifiers compose RDF IRIs in the direct
> > graph. These identifiers are separated by the punctuation characters
> > '#', ',', '/' and '='. All SQL identifiers are escaped following URL-
> > encoding
> > <http://www.w3.org/TR/html5/association-of-controls-and-forms.html#url-encoded-form-data>
> > except that only the above punctuation and the characters not
> > permitted in RDF IRIs are escaped.
> 
> I'd define once: The URL-encoded form of a string is …
> 
> And then explicitly state that the so-and-so IRI is the concatentation of base IRI, '/', URL-encoded form of the table name, and so on.
> 
> (I recall discussions about using relative IRIs in the direct mapping. It might be easiest to limit that to the examples. “The example omits the base IRI for brevity, and uses relative IRIs. In the actual direct mapping graph, the base IRI would be prepended to all IRIs.”)

Didn't attack yet. stuck in a todo.

> > In the direct graph, there is an identifier for each row in a database
> > table. If the row is in a table with a primary key, this is formed
> > from the table name and the attribute names and values of each attribute
> > in the primary key. If there is no primary key for the table, the row
> > identifier is a fresh blank node:
> > 
> > dfn row identifier:
> > 
> >   if the table has a primary key with attributes, the relative IRI for
> >   the row identifier is the concatenation of the table name, '/', and
> >   a ','-separated concatenation of each attribute name, '=', and the
> >   attribute value.
> > 
> >   if the table has no primary key, the row identifier is a fresh blank
> >   node.
> 
> This doesn't need to be repeated twice. I'd call it row IRI for maximum clarity.

I'm not sure what's repeated. If you mean that there are two clauses, they deal with different cases.
Re: "row IRI", we could say that "row identifier" is either a "row IRI" or "row blank node". Proposed text?

> > A (potentially unary) list of attribute names in a table form a
> > property IRI:
> > 
> > dfn property IRI: the concationation of the table name, '/', and a
> >   ','-separated concatonation of each attribute name, and a '#' at
> >   the end of the property IRI.
> 
> This doesn't need to be repeated one-and-a-half times.

The property IRI is simpler than the earlier definition (doesn't include column values).

> > The values in a row are mapped to RDF literals:
> > 
> > dfn literal map: a mapping from an SQL value with a datatype to an RDF
> >   literal with and XML Schema datatype where the RDF literal has a
> >   lexical value equivalent to the SQL lexical value and the datatype
> >   mapping is found in this table:
> > 
> > SQL   XSD datatype
> > ___     ____________
> > INT  http://www.w3.org/TR/xmlschema-2/#integer
> > FLOAT  http://www.w3.org/TR/xmlschema-2/#float
> > DATE  http://www.w3.org/TR/xmlschema-2/#date
> > TIME  http://www.w3.org/TR/xmlschema-2/#time
> > TIMESTAMP  http://www.w3.org/TR/xmlschema-2/#dateTime
> > CHAR  plain literal
> > VARCHAR plain literal
> > STRING  plain literal
> 
> This should use the standard SQL 2008 types, including BOOLEAN and BINARY string types. (Probably the Direct Mapping can re-use the outcome of R2RML ISSUE-48 here.)

Labeled as an issue. Have you incorporated that into R2RML (when there's not rr:datatype) so I can steal the text?

> > The Direct Mapping is defined by a set of mapping functions from table
> > rows to RDF triples:
> > 
> > dfn direct mapping: the set of RDF triples produced by invoking the
> >   <table mapping> on each table in a database.
> 
> A minor stylistic point but I'd say: The direct mapping graph is the union of the table graphs for each table.
> 
> > dfn table mapping: the set of RDF triples created by invoking the
> >   <row mapping> on each row in a table.
> 
> I'd say, the table graph of a table is the union of the row graphs for each row.

If I understand this, it implies the definition of table graph which might then be defined row graphs. Is this your proposal?

> > dfn row mapping: using a row identifier S for the row,
> >  the type triple:
> >    (S, rdf:type, <table type>)
> >  plus the scalar triples:
> >    for each attribute in the list of <scalars> where the attribute
> >      value is non-NULL:
> >      (S,
> >       the <property IRI> for the attribute,
> >       the <literal map> for the attribute value).
> >  plus the reference triples:
> >    for each list of attributes in the <non-unary references> where none
> >      of the attribute values are NULL:
> >      (S,
> >       the <property IRI> for the attributes,
> >       the <row identifier> for the referenced triple)
> > ]]
> 
> I'd decompose this a bit: The row graph of a row is a graph consisting of the following triples:
> - the row type triple
> - a data triple for each non-foreign key column where the data value is non-null
> - a reference triple for each foreign key column ...
> 
> And then:
> 
> The row type triple of a row is an RDF triple with the following components:
> - subject: the row IRI of the row
> - predicate: rdf:type
> - object: the table class IRI of the row's table
> 
> et cetera.

I worked from this angle for a bit, but the challenging thing was ensuring the same subject without introducing some sort of hand-waiving about "the current subject" or some such.
Recall that the containing table may not have a primary key (or even any candidate keys).

> I know this might not be politically correct in RDF circles, but again I'll point out this post that I found very helpful when editing R2RML:
> http://ln.hixie.ch/?start=1140242962&count=1
> 
> Best,
> Richard

-- 
-ericP
Received on Wednesday, 3 August 2011 17:47:05 UTC