Re: Review of R2RML working draft from Richard Cyganiak on 2011-07-25 (public-rdb2rdf-wg@w3.org from July 2011)

From: Richard Cyganiak <richard@cyganiak.de>
Date: Mon, 25 Jul 2011 14:04:02 +0100
To: Sören Auer <soeren.auer@gmail.com>
Cc: rdb2RDF WG <public-rdb2rdf-wg@w3.org>
Message-Id: <0B886870-64E9-4E21-98FC-C8768C6EB56B@cyganiak.de>
Hi Sören,

Thanks a lot for this careful and very helpful review. Comments below.

On 25 Jul 2011, at 11:15, Sören Auer wrote:
> * "The intended audience of this specification are implementors of ..."
> => "The intended audience of this specification IS implementors of "

Fixed

> * "Blue tables are example input into an R2RML mapping:" => "Blue tables
> contain example input into an R2RML mapping:"

Fixed

> * I would move the "or" in beginning of the last enum item of logical
> tables to the end of the previous one

Fixed

> 2.3 A Mapping for the Example Database
> 
> * it is unclear what is actually meant here with "rr:parentTriplesMap
> <#TriplesMap2>;" and how this referencing works - we should add few more
> sentences as explanation here

I see what you mean. If a reader knows Turtle, then this will make sense to them; if they don't know Turtle, then it will be rather confusing. We state in the Introduction: “A reader's familiarity with … the Turtle syntax is assumed.” But there is a valid question here whether we're setting the bar too high.

> * "A SQL connection to ..." => "An SQL connection to ..."

This depends if you pronounce SQL like "S-Q-L" or like "Sequel". Both pronunciations are common and I don't think you can say that either of them is incorrect. I'm using the latter, and this explains the spelling throughout the text. It's no big deal and I'm happy to change it if a majority prefers to have it changed.

> * remove "to" in "without an explicit to catalog or schema reference"

> * concatentation => concatenation

Fixed

> * "The base IRI MUST be a valid IRI." -- base IRI was not yet introduced
> here, maybe we should have an introductory sentence before that like
> "Each mapping has an associated base URI."

The term "base IRI" was actually introduced just three paragraphs earlier. (The term is a hyperlink that takes you to the definition.)

> * "a SQL connection" => "an SQL connection" <- this might have to
> replaced everywhere throughout the doc

See above.

> * "SHOULD NOT include any IRIs that start with the rr: namespace IRI,
> but are not in the R2RML vocabulary." => "SHOULD NOT include any IRIs
> that start with the rr: namespace IRI, but are not DEFINED in the R2RML
> vocabulary."

I could nitpick and point out that "R2RML vocabulary" is defined as a "set of IRIs", so it is strictly correct to talk about "IRIs that are not in the R2RML vocabulary." But I'm fine with your version too, so: fixed.

> * specificaton => specification
> * add full stop behind "(in other words, are “unused”)"

Fixed

> "An R2RML mapping document is any document written in the Turtle RDF
> syntax that encodes an R2RML mapping graph."
> 
> I guess this was discussed already during the telco, but do we really
> require a mapping to be in the Turtle syntax - what's the purpose of
> using RDF then - from my POV all valid RDF serializations should be
> eligible.

All RDF serializations *are* eligible. You can encode an R2RML mapping graph in any syntax you like, and it's still a conforming R2RML mapping graph. (It may just not be a conforming R2RML mapping *document*, which is a different concept and a stronger claim.)

See here for the argument why *just* defining R2RML as a graph, without talking about syntax, would be a Bad Thing:
http://lists.w3.org/Archives/Public/public-rdb2rdf-wg/2011Jun/0165.html

In summary, our goal is interoperability, and for interoperability you need to exchange actual files, and files need to be in a syntax, and if two implementations don't talk the same syntax then they don't interoperate.

> When reading a little further there is even a contradiction with the
> following sentence:
> 
> "A conforming R2RML processor MUST accept R2RML mapping documents in
> Turtle syntax. It MAY accept R2RML mapping graphs encoded in other RDF
> syntaxes."

Where do you see a contradiction?

> I think its good to require all R2RML processors to support Turtle, but
> I would still allow R2RML mapping graphs encoded in other RDF syntaxes
> to be called R2RML mapping documents.
> 
> * "@prefix : <#>" - is not very illustrative for a novice user, maybe
> rather "@prefix : <http://example.com/ns#>"

The point of the Note is to illustrate an easy way of creating *document-local* IRIs. This requires the <#> IRI.

This goes back a bit to your earlier point about Turtle constructs like <#TriplesMap1>.

I'll raise an issue about this, because the way it is currently expressed in the document seems not to work for many readers.

> * shouldn't "however" here and everywhere else in the document be
> enclosed by commas?!

You are right. Fixed. (It only occurs one other time, and was in commas there.)

> * I would not introduce a new term "SQL query-based table" here, since
> it is nothing else than an SQL query -- why not just writing "SQL query"
> instead and confuse the reader less. After all we just emulate the
> creation of views here in R2RML, in order to not require the privilege
> for the creation of real views.

The way I think about this, as "SQL query" is the string that goes into the rr:sqlQuery property. A "SQL query-based table" is one kind of R2RML logical table, and it is a resource that has multiple properties, including rr:sqlQuery and rr:sqlVersion. I think it would be more confusing (and hard to maintain from a precise specification point of view) to use the same term for both.

Ultimately, the term "SQL query-based table" just exists in order to disambiguate between "SQL query" as understood in a SQL context ("SELECT ... FROM ...") and "SQL query" as understood in an R2RML context (the value of rr:logicalTable, which has a rr:sqlQuery property that contains a "SELECT ... FROM ..." string).

You will find in later sections that the spec often introduces such terms, which are not terribly useful from a didactic point of view, in order to be precise and unambiguous.

Might replacing "SQL query-based table" with another term ("Logical SQL query", "R2RML View", whatever) help?

> * euqivalent => equivalent

Fixed

> * This section reads quite over-complicated and confusing to the casual
> reader. Maybe we should add an explanation here such as "R2RML emulates
> the creation of views, in order to not require the privilege for the
> creation of real views in the underlying DBMS."

I was a bit hesitant to explain this feature by analogy to SQL views, because this might create certain expectations that don't hold. For example, you can use a view in further SQL queries, but you can't do the same with a SQL query-based table.

I've done some changes to the section:

1. Explicitly introduce a term "SQL query" to describe the conditions that the value of rr:sqlQuery must satisfy

2. Add a Note in the section with the following explanation:

[[
SQL query-based tables allow the use of SQL queries as the source of data for a triples map in order to do data transformations or filtering before generating triples from the database.

The same effect can be achieved by creating a “real” SQL view in the input database and referring to it withrr:tableName, but this requires more database privileges, or may not be practical for other reasons.

Note that unlike “real” SQL views, a SQL query-based table can't be used as an input table in further SQL queries.
]]

Hope that addresses the issue.

Again, thanks for this very helpful review! Looking forward to Part II.

Best,
Richard
Received on Monday, 25 July 2011 13:04:32 UTC