- From: Eric Prud'hommeaux <eric@w3.org>
- Date: Fri, 5 Aug 2011 20:09:37 +0200
- To: Juan Sequeda <juanfederico@gmail.com>
- Cc: Richard Cyganiak <richard@cyganiak.de>, Michael Hausenblas <michael.hausenblas@deri.org>, rdb2RDF WG <public-rdb2rdf-wg@w3.org>
* Juan Sequeda <juanfederico@gmail.com> [2011-08-03 19:22-0500] > On Wed, Aug 3, 2011 at 12:46 PM, Eric Prud'hommeaux <eric@w3.org> wrote: > > > * Richard Cyganiak <richard@cyganiak.de> [2011-08-02 23:32+0100] > > > On 2 Aug 2011, at 15:19, Eric Prud'hommeaux wrote: > > > > • DM is for "all the tables in a database" > > > > I debated this; I didn't want to be alarm folks who would think > > > > they'd have to expose everything if they didn't want to. The > > > > alternative is to parametrize; neither is terribly attractive. I > > > > guess "all tables" is fine. > > > > > > "all tables and views in the schema"? > > > > "each table and view in a database schema"? > > done in two places (here and the definition below). > > > > > > • s/an SQL/a SQL/ > > > > This depends on whether you call it "S Q L" or "sequal". The SQL > > > > spec uses "an", e.g. "Effects of SQL-statements in an > > SQL-transaction". > > > > > > Ah, interesting point. R2RML uses “a SQL” but that's just my personal > > preference. I guess the spec should be considered authoritative on this. > > > > > > > [[ > > > > The Direct Mapping is a formula for creating an RDF graph from the > > > > rows of each table in a database. A base IRI defines a web space for > > > > the labels in this graph; all labels are generated by appending to the > > > > base. > > > > > > There are no “labels” in an RDF graph. Let's please stick to the standard > > terminology from the specs. > > > > done > > also s/attribute/column/ # ignoring the question of "fields" > > > > > > The functions scalar and reference extract the scalar and reference > > > > attributes (those participating in a foreign key) respectively: > > > > > > Why does this have to be formulated as “functions”? > > > > Is there a more intuitive way to say that there's an exact mapping from the > > input onto the outputs? > > And isn't that exactly what an implementor wants to know? > > > > > > dfn scalars: the attributes in a table which are NOT in any foreign > > > > key. > > > > > > How about: The non-foreign key columns of a table are the columns which > > are not in any foreign key. > > > > Looking at it in-situ < > > http://www.w3.org/2001/sw/rdb2rdf/directMapping/EGP#defn-scalars>, I'm not > > convinced that the "defintion X: X is..." redundancy will be helpful. > > > > > > dfn references: the attributes in a table's foreign keys. > > > > > > How about: The foreign key columns of a table are the columns which are > > in some foreign key. > > > > ditto > > > > > > SQL table and attribute identifiers compose RDF IRIs in the direct > > > > graph. These identifiers are separated by the punctuation characters > > > > '#', ',', '/' and '='. All SQL identifiers are escaped following URL- > > > > encoding > > > > < > > http://www.w3.org/TR/html5/association-of-controls-and-forms.html#url-encoded-form-data > > > > > > > except that only the above punctuation and the characters not > > > > permitted in RDF IRIs are escaped. > > > > > > I'd define once: The URL-encoded form of a string is … > > > > > > And then explicitly state that the so-and-so IRI is the concatentation of > > base IRI, '/', URL-encoded form of the table name, and so on. > > > > > > (I recall discussions about using relative IRIs in the direct mapping. It > > might be easiest to limit that to the examples. “The example omits the base > > IRI for brevity, and uses relative IRIs. In the actual direct mapping graph, > > the base IRI would be prepended to all IRIs.”) > > > > Didn't attack yet. stuck in a todo. > > > > > > In the direct graph, there is an identifier for each row in a database > > > > table. If the row is in a table with a primary key, this is formed > > > > from the table name and the attribute names and values of each > > attribute > > > > in the primary key. If there is no primary key for the table, the row > > > > identifier is a fresh blank node: > > > > > > > > dfn row identifier: > > > > > > > > if the table has a primary key with attributes, the relative IRI for > > > > the row identifier is the concatenation of the table name, '/', and > > > > a ','-separated concatenation of each attribute name, '=', and the > > > > attribute value. > > > > > > > > if the table has no primary key, the row identifier is a fresh blank > > > > node. > > > > > > This doesn't need to be repeated twice. I'd call it row IRI for maximum > > clarity. > > > > I'm not sure what's repeated. If you mean that there are two clauses, they > > deal with different cases. > > Re: "row IRI", we could say that "row identifier" is either a "row IRI" or > > "row blank node". Proposed text? > > > > > > A (potentially unary) list of attribute names in a table form a > > > > property IRI: > > > > > > > > dfn property IRI: the concationation of the table name, '/', and a > > > > ','-separated concatonation of each attribute name, and a '#' at > > > > the end of the property IRI. > > > > > > This doesn't need to be repeated one-and-a-half times. > > > > The property IRI is simpler than the earlier definition (doesn't include > > column values). > > > > > > The values in a row are mapped to RDF literals: > > > > > > > > dfn literal map: a mapping from an SQL value with a datatype to an RDF > > > > literal with and XML Schema datatype where the RDF literal has a > > > > lexical value equivalent to the SQL lexical value and the datatype > > > > mapping is found in this table: > > > > > > > > SQL XSD datatype > > > > ___ ____________ > > > > INT http://www.w3.org/TR/xmlschema-2/#integer > > > > FLOAT http://www.w3.org/TR/xmlschema-2/#float > > > > DATE http://www.w3.org/TR/xmlschema-2/#date > > > > TIME http://www.w3.org/TR/xmlschema-2/#time > > > > TIMESTAMP http://www.w3.org/TR/xmlschema-2/#dateTime > > > > CHAR plain literal > > > > VARCHAR plain literal > > > > STRING plain literal > > > > > > This should use the standard SQL 2008 types, including BOOLEAN and BINARY > > string types. (Probably the Direct Mapping can re-use the outcome of R2RML > > ISSUE-48 here.) > > > > Labeled as an issue. Have you incorporated that into R2RML (when there's > > not rr:datatype) so I can steal the text? > > > > > > The Direct Mapping is defined by a set of mapping functions from table > > > > rows to RDF triples: > > > > > > > > dfn direct mapping: the set of RDF triples produced by invoking the > > > > <table mapping> on each table in a database. > > > > > > A minor stylistic point but I'd say: The direct mapping graph is the > > union of the table graphs for each table. > > > > > > > dfn table mapping: the set of RDF triples created by invoking the > > > > <row mapping> on each row in a table. > > > > > > I'd say, the table graph of a table is the union of the row graphs for > > each row. > > > > If I understand this, it implies the definition of table graph which might > > then be defined row graphs. Is this your proposal? > > > > > > dfn row mapping: using a row identifier S for the row, > > > > the type triple: > > > > (S, rdf:type, <table type>) > > > > plus the scalar triples: > > > > for each attribute in the list of <scalars> where the attribute > > > > value is non-NULL: > > > > (S, > > > > the <property IRI> for the attribute, > > > > the <literal map> for the attribute value). > > > > plus the reference triples: > > > > for each list of attributes in the <non-unary references> where none > > > > of the attribute values are NULL: > > > > (S, > > > > the <property IRI> for the attributes, > > > > the <row identifier> for the referenced triple) > > > > ]] > > > > > > I'd decompose this a bit: The row graph of a row is a graph consisting of > > the following triples: > > > - the row type triple > > > - a data triple for each non-foreign key column where the data value is > > non-null > > > - a reference triple for each foreign key column ... > > > > > > And then: > > > > > > The row type triple of a row is an RDF triple with the following > > components: > > > - subject: the row IRI of the row > > > - predicate: rdf:type > > > - object: the table class IRI of the row's table > > > > > > et cetera. > > > > I worked from this angle for a bit, but the challenging thing was ensuring > > the same subject without introducing some sort of hand-waiving about "the > > current subject" or some such. > > Recall that the containing table may not have a primary key (or even any > > candidate keys). > > > > > Eric, > > I agree with Richard on this one. Actually, we already have something like > this (or practically identical) > > http://www.w3.org/2001/sw/rdb2rdf/directMapping/#rules Rev 1.4 is guaranteed to meet everyone's needs or your money back. http://www.w3.org/2001/sw/rdb2rdf/directMapping/EGP#defn-row%20graph > > > I know this might not be politically correct in RDF circles, but again > > I'll point out this post that I found very helpful when editing R2RML: > > > http://ln.hixie.ch/?start=1140242962&count=1 > > > > > > Best, > > > Richard > > > > -- > > -ericP > > -- -ericP
Received on Friday, 5 August 2011 18:09:36 UTC