- From: David McNeil <dmcneil@revelytix.com>
- Date: Wed, 10 Aug 2011 10:12:20 -0500
- To: RDB2RDF WG <public-rdb2rdf-wg@w3.org>
- Message-ID: <CA+8VvdwQsOqH3MMWQfNXbKdb-Q_rETFad0axgErbSb8-BrumQg@mail.gmail.com>
I read the latest Direct Mapping spec [1] (only skimmed Appendices). Below, identified by section number, are the comments I had while reading it. -David [1] http://www.w3.org/2001/sw/rdb2rdf/directMapping/EGP ==== 1 - "intended to provide a default behavior for R2RML" - It might be worth reconsidering the wording of this to avoid implying that R2RML prescribes this as default behavior. 1 - "It can be also used" - awkward sentence structure 2 - Wrong URL for RFC3987 link. 2 - I found the sudden transition to talking about FKs to be a bit jarring. Maybe there is a way to make this flow better? 2 - "This graph is composed of relative IRIs" - I know this has been discussed on the mailing list, but this is non-standard, eh? Isn't IRI prefixing a serialization issue? Also, does the user provide the base IRI? As I recall a goal was for the direct mapping to run without any user configurable options beyond pointing it at a database. 2.1 - For clarity, the "People" PRIMARY KEY clause should not be on the same line as the "addr" field. 2.1 - Per standard SQL, I think the string literals in the INSERT statements should have single quotes so they are not interpreted as identifiers. 2.1 - I think using the first row of a table for DB metadata is confusing given the widely understood model of having the column names as the first row. Especially considering that the fonts are the same. Maybe if the metadata were in non-bold italics it would be easier to read? 2.2 - "compound and composite" - At first glance this seems redundant. 2.2 - "People tables's" - It is still early, but that can't be the right apostrophe use? 2.2 - "The referent identifier (object of the above predicate)" - For clarity, I would just say "the object" 2.2 - +1 to Souri's observation that this approach does not handle multiple foreign keys from the same columns. 2.2 - ":(deptName, deptCity) is a multi-column foreign key in the table People which references the multi-column candidate key (name, city) in the table Department." This is awkward to read and is just a repetition of the formal FK definition in the DDL. I would omit it. 2.3 - I realized that I wasn't sure if I was reading the spec, or reading an example. It seems to me that the text needs to be more clearly identified, on a paragraph basis as to whether it is an informal description of the spec or a concrete example. For example, the R2RML spec highlights examples with an alternate color and a surrounding label/box. Personally I think I would swap the order of sections 3 & 2 or intersperse the examples from section 2 into section 3. 2.3 "would have been generated" - Seems the text is clearer to read if we can stick to a more active voice. 2.4 "(for keeping track of tweets in Twitter)" - I would find a way to remove the parens. 2.4 "It is not possible to dereference blank nodes" - I don't immediately see what the point of this statement is. 2.5 - I suspect this has been discussed at great length in the past, but from my perspective the way blank nodes identifiers are used in this example seems to create implementation pain. In particular the processing of a row in a table is not simply a function of that row. Rather, it must access the "global" list of what blank node identifier is used for each database value that is used as an FK to a non-PK. For this reason, the way we solved this problem at Revelytix was to use the data value itself to form the identifier. I think this applies whether an IRI or a blank node is used to identify the PK-less row. 3 - At Revelytix we have found it useful to define two base URIs: one for ontology URIs and another for data 3 - "all labels are generated by appending to a base." - I think someone else mentioned this already, but it seems referring to the IRIs as "labels" is confusing and we should use more precise words here. 3 - "the percent-encoded form of the column value" - This presupposes a text representation of the column value. Is it specified elsewhere how to get a text representation? 3 - "fresh blank node" - Personally, seems ok to me, but do we need more precise words for this? 3 - "A (potentially unary)" - I encountered several places like this where I found the parens distracting. 3 - "Definition property IRI:" At one point I found myself mis-reading this as a definition of the term "definition property IRI". The R2RML spec seems to define terms more clearly with a formatted construct like: "A _data error_ is a condition of the data in the" A.1 - I think the English Syntax should be shown by default. A & B - I stopped reading it closely, but (at the risk of stating the obvious and of stirring up previous compromises) it seems like an over-abundance of notations. Truly it is hard to tell how many of them there are and seems it will be challenging to keep them all in synch as the spec evolves. I would remove some of them. Other thoughts, perhaps these have been addressed in past discussions already and I just don't know the answers: * do we need to say anything about how a direct mapping generator finds a database? * do we need to say anything about which schema to map? * how about synonyms in the database? We have found this to be a pain point in practice. * does it need a mechanism for omitting the schema tables from the mapping? * I notice the spec is silent about case sensitivity of database identifiers. I suppose it is implied that the casing used in the database metadata is preserved?
Received on Wednesday, 10 August 2011 15:12:57 UTC