Re: Proposal for the Direct Mapping from Juan Sequeda on 2011-08-04 (public-rdb2rdf-wg@w3.org from August 2011)

From: Juan Sequeda <juanfederico@gmail.com>
Date: Wed, 3 Aug 2011 19:22:05 -0500
To: "Eric Prud'hommeaux" <eric@w3.org>
Cc: Richard Cyganiak <richard@cyganiak.de>, Michael Hausenblas <michael.hausenblas@deri.org>, rdb2RDF WG <public-rdb2rdf-wg@w3.org>
Message-ID: <CAMVTWDxk0kDt4bxMOj_0k9y=WNv74kUrNyjYwGEt39dXbc+Ojg@mail.gmail.com>
On Wed, Aug 3, 2011 at 12:46 PM, Eric Prud'hommeaux <eric@w3.org> wrote:

> * Richard Cyganiak <richard@cyganiak.de> [2011-08-02 23:32+0100]
> > On 2 Aug 2011, at 15:19, Eric Prud'hommeaux wrote:
> > >  • DM is for "all the tables in a database"
> > >    I debated this; I didn't want to be alarm folks who would think
> > >    they'd have to expose everything if they didn't want to. The
> > >    alternative is to parametrize; neither is terribly attractive. I
> > >    guess "all tables" is fine.
> >
> > "all tables and views in the schema"?
>
> "each table and view in a database schema"?
> done in two places (here and the definition below).
>
> > >  • s/an SQL/a SQL/
> > >    This depends on whether you call it "S Q L" or "sequal". The SQL
> > >    spec uses "an", e.g. "Effects of SQL-statements in an
> SQL-transaction".
> >
> > Ah, interesting point. R2RML uses “a SQL” but that's just my personal
> preference. I guess the spec should be considered authoritative on this.
> >
> > > [[
> > > The Direct Mapping is a formula for creating an RDF graph from the
> > > rows of each table in a database. A base IRI defines a web space for
> > > the labels in this graph; all labels are generated by appending to the
> > > base.
> >
> > There are no “labels” in an RDF graph. Let's please stick to the standard
> terminology from the specs.
>
> done
> also s/attribute/column/ # ignoring the question of "fields"
>
> > > The functions scalar and reference extract the scalar and reference
> > > attributes (those participating in a foreign key) respectively:
> >
> > Why does this have to be formulated as “functions”?
>
> Is there a more intuitive way to say that there's an exact mapping from the
> input onto the outputs?
> And isn't that exactly what an implementor wants to know?
>
> > > dfn scalars: the attributes in a table which are NOT in any foreign
> > >   key.
> >
> > How about: The non-foreign key columns of a table are the columns which
> are not in any foreign key.
>
> Looking at it in-situ <
> http://www.w3.org/2001/sw/rdb2rdf/directMapping/EGP#defn-scalars>, I'm not
> convinced that the "defintion X: X is..." redundancy will be helpful.
>
> > > dfn references: the attributes in a table's foreign keys.
> >
> > How about: The foreign key columns of a table are the columns which are
> in some foreign key.
>
> ditto
>
> > > SQL table and attribute identifiers compose RDF IRIs in the direct
> > > graph. These identifiers are separated by the punctuation characters
> > > '#', ',', '/' and '='. All SQL identifiers are escaped following URL-
> > > encoding
> > > <
> http://www.w3.org/TR/html5/association-of-controls-and-forms.html#url-encoded-form-data
> >
> > > except that only the above punctuation and the characters not
> > > permitted in RDF IRIs are escaped.
> >
> > I'd define once: The URL-encoded form of a string is …
> >
> > And then explicitly state that the so-and-so IRI is the concatentation of
> base IRI, '/', URL-encoded form of the table name, and so on.
> >
> > (I recall discussions about using relative IRIs in the direct mapping. It
> might be easiest to limit that to the examples. “The example omits the base
> IRI for brevity, and uses relative IRIs. In the actual direct mapping graph,
> the base IRI would be prepended to all IRIs.”)
>
> Didn't attack yet. stuck in a todo.
>
> > > In the direct graph, there is an identifier for each row in a database
> > > table. If the row is in a table with a primary key, this is formed
> > > from the table name and the attribute names and values of each
> attribute
> > > in the primary key. If there is no primary key for the table, the row
> > > identifier is a fresh blank node:
> > >
> > > dfn row identifier:
> > >
> > >   if the table has a primary key with attributes, the relative IRI for
> > >   the row identifier is the concatenation of the table name, '/', and
> > >   a ','-separated concatenation of each attribute name, '=', and the
> > >   attribute value.
> > >
> > >   if the table has no primary key, the row identifier is a fresh blank
> > >   node.
> >
> > This doesn't need to be repeated twice. I'd call it row IRI for maximum
> clarity.
>
> I'm not sure what's repeated. If you mean that there are two clauses, they
> deal with different cases.
> Re: "row IRI", we could say that "row identifier" is either a "row IRI" or
> "row blank node". Proposed text?
>
> > > A (potentially unary) list of attribute names in a table form a
> > > property IRI:
> > >
> > > dfn property IRI: the concationation of the table name, '/', and a
> > >   ','-separated concatonation of each attribute name, and a '#' at
> > >   the end of the property IRI.
> >
> > This doesn't need to be repeated one-and-a-half times.
>
> The property IRI is simpler than the earlier definition (doesn't include
> column values).
>
> > > The values in a row are mapped to RDF literals:
> > >
> > > dfn literal map: a mapping from an SQL value with a datatype to an RDF
> > >   literal with and XML Schema datatype where the RDF literal has a
> > >   lexical value equivalent to the SQL lexical value and the datatype
> > >   mapping is found in this table:
> > >
> > > SQL         XSD datatype
> > > ___     ____________
> > > INT         http://www.w3.org/TR/xmlschema-2/#integer
> > > FLOAT       http://www.w3.org/TR/xmlschema-2/#float
> > > DATE        http://www.w3.org/TR/xmlschema-2/#date
> > > TIME        http://www.w3.org/TR/xmlschema-2/#time
> > > TIMESTAMP   http://www.w3.org/TR/xmlschema-2/#dateTime
> > > CHAR        plain literal
> > > VARCHAR plain literal
> > > STRING      plain literal
> >
> > This should use the standard SQL 2008 types, including BOOLEAN and BINARY
> string types. (Probably the Direct Mapping can re-use the outcome of R2RML
> ISSUE-48 here.)
>
> Labeled as an issue. Have you incorporated that into R2RML (when there's
> not rr:datatype) so I can steal the text?
>
> > > The Direct Mapping is defined by a set of mapping functions from table
> > > rows to RDF triples:
> > >
> > > dfn direct mapping: the set of RDF triples produced by invoking the
> > >   <table mapping> on each table in a database.
> >
> > A minor stylistic point but I'd say: The direct mapping graph is the
> union of the table graphs for each table.
> >
> > > dfn table mapping: the set of RDF triples created by invoking the
> > >   <row mapping> on each row in a table.
> >
> > I'd say, the table graph of a table is the union of the row graphs for
> each row.
>
> If I understand this, it implies the definition of table graph which might
> then be defined row graphs. Is this your proposal?
>
> > > dfn row mapping: using a row identifier S for the row,
> > >  the type triple:
> > >    (S, rdf:type, <table type>)
> > >  plus the scalar triples:
> > >    for each attribute in the list of <scalars> where the attribute
> > >      value is non-NULL:
> > >      (S,
> > >       the <property IRI> for the attribute,
> > >       the <literal map> for the attribute value).
> > >  plus the reference triples:
> > >    for each list of attributes in the <non-unary references> where none
> > >      of the attribute values are NULL:
> > >      (S,
> > >       the <property IRI> for the attributes,
> > >       the <row identifier> for the referenced triple)
> > > ]]
> >
> > I'd decompose this a bit: The row graph of a row is a graph consisting of
> the following triples:
> > - the row type triple
> > - a data triple for each non-foreign key column where the data value is
> non-null
> > - a reference triple for each foreign key column ...
> >
> > And then:
> >
> > The row type triple of a row is an RDF triple with the following
> components:
> > - subject: the row IRI of the row
> > - predicate: rdf:type
> > - object: the table class IRI of the row's table
> >
> > et cetera.
>
> I worked from this angle for a bit, but the challenging thing was ensuring
> the same subject without introducing some sort of hand-waiving about "the
> current subject" or some such.
> Recall that the containing table may not have a primary key (or even any
> candidate keys).
>


Eric,

I agree with Richard on this one. Actually, we already have something like
this (or practically identical)

http://www.w3.org/2001/sw/rdb2rdf/directMapping/#rules



>
> > I know this might not be politically correct in RDF circles, but again
> I'll point out this post that I found very helpful when editing R2RML:
> > http://ln.hixie.ch/?start=1140242962&count=1
> >
> > Best,
> > Richard
>
> --
> -ericP
>
Received on Thursday, 4 August 2011 00:22:53 UTC