Review of R2RML Document from Harry Halpin on 2010-12-12 (public-rdb2rdf-wg@w3.org from December 2010)

From: Harry Halpin <hhalpin@w3.org>
Date: Sun, 12 Dec 2010 21:01:41 -0000 (GMT)
To: public-rdb2rdf-wg@w3.org
Message-ID: <681212e9c98e68d9da7765e8c2959645.squirrel@webmail-mit.w3.org>
So, I sat down and gave a review of R2RML. Overall, very nice work.

I'd like to go through it and give a few comments, and also my opinion
(not W3C's, I'm no longer the staff contact) on the various open issues.

Issue 1) Relationship to "default" mapping. This seems to mainly be a
terminological issue relating. We can simply say an "in the lack of a
presence of a R2RML configuration, R2RML processors SHOULD use the Direct
Mapping algorithm as specified in "A Direct Mapping of Relational Data to
RDF."

The prraagpha about it, i.e. "Besides the R2RML language, thsi workig
group will also.." should be fixed to reflect the release of a direct
mapping document.

Issue 2) "Convention over configuration": I think I was the person that
asked for this, i.e. that R2RML would be seen as a way of modifying the
original direct mapping - mainly to motivation to reduce the amount of
syntax needed. Actually, upon consideration, that would complicate things
unnecessarily, as people probably only want to use R2RML to expose "part"
of their database. Therefore, I suggest that we drop the configuration
over convention idea. However, there does need to be a way to pass
parameters to Direct Mapping.

Issue 3) "Instance-level" output. I think we should allow some sort of 
easy way to get simple schemas out of the DB schema, but not overload the
language too much more. I.e. one more class with 10 or so properties at
most.

In detail:

In 2.1.3, I think the output should not be "genrated pairs" but just
triples. Seeing they are attached to a subject does not need to be done in
a separate step, it just complicates things.

In 3.2.1, can we have a new property that points to a string that points
out which SQL language (i.e. MySQL or Oracle) the query is in and add a
note that this is necessary due to implementations having different
constructs in SQL.

In 3.3.1.3, rr:termtype. I'm a bit confused about exactly what this does -
is it for the default subject? I.e. is this the way to make the default
subject a blank node?  If not specified, the genreated subject component
will be a IRI - i.e. where is this IRI given? Are we going to use the same
base-IRI convention as in Direct Mapping? If so, it shold be specified. It
seems the graph name is done in rr:graphTemplate, but what if this ins't
there *and* rr:termtype is set blank?

In 3.3.1.7, I'd like to get some more detail about the difference between
rr:template and rr:inverseExpression. It seems like the
inverseExpression's main restriction is that it can be used in a WHERE
clause of a SQL query. This is an important implementation detail....

In 3.6.1.1 rr:object has a range of rdfs:Resource, which includes both
literals and URIs. In the example below its clearly a literal. Maybe also
point out to people that arent' RDF experts that a URI could be there.

The Table in 4 points out that most, if not all, properties are shared
between PredicateObjectMaps and SubjectMaps. The explanation repeats many
of the same properties. I'm not sure if its better not just to have a
TripleMap section, but then that may make exposition more difficult, so
perhaps not.

The examples in part A are great. We need to add one that deals with
foreign keys and RefPredicateObjectMap.







Issue
Received on Sunday, 12 December 2010 21:01:43 UTC