- From: Juan Sequeda <juanfederico@gmail.com>
- Date: Wed, 3 Aug 2011 19:22:05 -0500
- To: "Eric Prud'hommeaux" <eric@w3.org>
- Cc: Richard Cyganiak <richard@cyganiak.de>, Michael Hausenblas <michael.hausenblas@deri.org>, rdb2RDF WG <public-rdb2rdf-wg@w3.org>
- Message-ID: <CAMVTWDxk0kDt4bxMOj_0k9y=WNv74kUrNyjYwGEt39dXbc+Ojg@mail.gmail.com>
On Wed, Aug 3, 2011 at 12:46 PM, Eric Prud'hommeaux <eric@w3.org> wrote: > * Richard Cyganiak <richard@cyganiak.de> [2011-08-02 23:32+0100] > > On 2 Aug 2011, at 15:19, Eric Prud'hommeaux wrote: > > > • DM is for "all the tables in a database" > > > I debated this; I didn't want to be alarm folks who would think > > > they'd have to expose everything if they didn't want to. The > > > alternative is to parametrize; neither is terribly attractive. I > > > guess "all tables" is fine. > > > > "all tables and views in the schema"? > > "each table and view in a database schema"? > done in two places (here and the definition below). > > > > • s/an SQL/a SQL/ > > > This depends on whether you call it "S Q L" or "sequal". The SQL > > > spec uses "an", e.g. "Effects of SQL-statements in an > SQL-transaction". > > > > Ah, interesting point. R2RML uses “a SQL” but that's just my personal > preference. I guess the spec should be considered authoritative on this. > > > > > [[ > > > The Direct Mapping is a formula for creating an RDF graph from the > > > rows of each table in a database. A base IRI defines a web space for > > > the labels in this graph; all labels are generated by appending to the > > > base. > > > > There are no “labels” in an RDF graph. Let's please stick to the standard > terminology from the specs. > > done > also s/attribute/column/ # ignoring the question of "fields" > > > > The functions scalar and reference extract the scalar and reference > > > attributes (those participating in a foreign key) respectively: > > > > Why does this have to be formulated as “functions”? > > Is there a more intuitive way to say that there's an exact mapping from the > input onto the outputs? > And isn't that exactly what an implementor wants to know? > > > > dfn scalars: the attributes in a table which are NOT in any foreign > > > key. > > > > How about: The non-foreign key columns of a table are the columns which > are not in any foreign key. > > Looking at it in-situ < > http://www.w3.org/2001/sw/rdb2rdf/directMapping/EGP#defn-scalars>, I'm not > convinced that the "defintion X: X is..." redundancy will be helpful. > > > > dfn references: the attributes in a table's foreign keys. > > > > How about: The foreign key columns of a table are the columns which are > in some foreign key. > > ditto > > > > SQL table and attribute identifiers compose RDF IRIs in the direct > > > graph. These identifiers are separated by the punctuation characters > > > '#', ',', '/' and '='. All SQL identifiers are escaped following URL- > > > encoding > > > < > http://www.w3.org/TR/html5/association-of-controls-and-forms.html#url-encoded-form-data > > > > > except that only the above punctuation and the characters not > > > permitted in RDF IRIs are escaped. > > > > I'd define once: The URL-encoded form of a string is … > > > > And then explicitly state that the so-and-so IRI is the concatentation of > base IRI, '/', URL-encoded form of the table name, and so on. > > > > (I recall discussions about using relative IRIs in the direct mapping. It > might be easiest to limit that to the examples. “The example omits the base > IRI for brevity, and uses relative IRIs. In the actual direct mapping graph, > the base IRI would be prepended to all IRIs.”) > > Didn't attack yet. stuck in a todo. > > > > In the direct graph, there is an identifier for each row in a database > > > table. If the row is in a table with a primary key, this is formed > > > from the table name and the attribute names and values of each > attribute > > > in the primary key. If there is no primary key for the table, the row > > > identifier is a fresh blank node: > > > > > > dfn row identifier: > > > > > > if the table has a primary key with attributes, the relative IRI for > > > the row identifier is the concatenation of the table name, '/', and > > > a ','-separated concatenation of each attribute name, '=', and the > > > attribute value. > > > > > > if the table has no primary key, the row identifier is a fresh blank > > > node. > > > > This doesn't need to be repeated twice. I'd call it row IRI for maximum > clarity. > > I'm not sure what's repeated. If you mean that there are two clauses, they > deal with different cases. > Re: "row IRI", we could say that "row identifier" is either a "row IRI" or > "row blank node". Proposed text? > > > > A (potentially unary) list of attribute names in a table form a > > > property IRI: > > > > > > dfn property IRI: the concationation of the table name, '/', and a > > > ','-separated concatonation of each attribute name, and a '#' at > > > the end of the property IRI. > > > > This doesn't need to be repeated one-and-a-half times. > > The property IRI is simpler than the earlier definition (doesn't include > column values). > > > > The values in a row are mapped to RDF literals: > > > > > > dfn literal map: a mapping from an SQL value with a datatype to an RDF > > > literal with and XML Schema datatype where the RDF literal has a > > > lexical value equivalent to the SQL lexical value and the datatype > > > mapping is found in this table: > > > > > > SQL XSD datatype > > > ___ ____________ > > > INT http://www.w3.org/TR/xmlschema-2/#integer > > > FLOAT http://www.w3.org/TR/xmlschema-2/#float > > > DATE http://www.w3.org/TR/xmlschema-2/#date > > > TIME http://www.w3.org/TR/xmlschema-2/#time > > > TIMESTAMP http://www.w3.org/TR/xmlschema-2/#dateTime > > > CHAR plain literal > > > VARCHAR plain literal > > > STRING plain literal > > > > This should use the standard SQL 2008 types, including BOOLEAN and BINARY > string types. (Probably the Direct Mapping can re-use the outcome of R2RML > ISSUE-48 here.) > > Labeled as an issue. Have you incorporated that into R2RML (when there's > not rr:datatype) so I can steal the text? > > > > The Direct Mapping is defined by a set of mapping functions from table > > > rows to RDF triples: > > > > > > dfn direct mapping: the set of RDF triples produced by invoking the > > > <table mapping> on each table in a database. > > > > A minor stylistic point but I'd say: The direct mapping graph is the > union of the table graphs for each table. > > > > > dfn table mapping: the set of RDF triples created by invoking the > > > <row mapping> on each row in a table. > > > > I'd say, the table graph of a table is the union of the row graphs for > each row. > > If I understand this, it implies the definition of table graph which might > then be defined row graphs. Is this your proposal? > > > > dfn row mapping: using a row identifier S for the row, > > > the type triple: > > > (S, rdf:type, <table type>) > > > plus the scalar triples: > > > for each attribute in the list of <scalars> where the attribute > > > value is non-NULL: > > > (S, > > > the <property IRI> for the attribute, > > > the <literal map> for the attribute value). > > > plus the reference triples: > > > for each list of attributes in the <non-unary references> where none > > > of the attribute values are NULL: > > > (S, > > > the <property IRI> for the attributes, > > > the <row identifier> for the referenced triple) > > > ]] > > > > I'd decompose this a bit: The row graph of a row is a graph consisting of > the following triples: > > - the row type triple > > - a data triple for each non-foreign key column where the data value is > non-null > > - a reference triple for each foreign key column ... > > > > And then: > > > > The row type triple of a row is an RDF triple with the following > components: > > - subject: the row IRI of the row > > - predicate: rdf:type > > - object: the table class IRI of the row's table > > > > et cetera. > > I worked from this angle for a bit, but the challenging thing was ensuring > the same subject without introducing some sort of hand-waiving about "the > current subject" or some such. > Recall that the containing table may not have a primary key (or even any > candidate keys). > Eric, I agree with Richard on this one. Actually, we already have something like this (or practically identical) http://www.w3.org/2001/sw/rdb2rdf/directMapping/#rules > > > I know this might not be politically correct in RDF circles, but again > I'll point out this post that I found very helpful when editing R2RML: > > http://ln.hixie.ch/?start=1140242962&count=1 > > > > Best, > > Richard > > -- > -ericP >
Received on Thursday, 4 August 2011 00:22:53 UTC