Comments on the R2RM Editors' draft

(Tracker, this should close ACTION-86)

Souri, Seema, Richard

it so happened that I had time today, so I did a review on the editor's draft of R2RML, as promised on the last telco. To be on the safe side, I looked at

dated 2010-12-07 at 19:23

Some or the comments might come from the fact that I am a new kid on the block...

I hope it will be helpful!



Status of the document, second paragraph: this is not a first public working draft any more:-)

Intro, second paragraph on direct mapping: I find the text a little bit 'incomplete' in comparing the two approaches. I changed the last sentence and added one before that:

Besides the R2RML language, this working group will also define a fixed "default mapping" from relational databases to RDF. In the default mapping of a database, the structure of the resulting RDF graph directly reflects the structure of the database, the target RDF vocabulary directly reflects the names of database schema elements, and neither structure nor target vocabulary can be changed. To generate a graph using structures and terms that are more appropriate to the final application, graph transformation tools (e.g., SPARQL, RIF) should be used. With R2RML on the other hand, a mapping author can define highly customized views over the relational data and the full transformation is performed by the R2RML engine itself.

This may be better...

Intro, fourth paragraph on Turtle: the sentence reads as if Turtle is the _only_ RDF syntax that is accepted for R2RML. I see the same comment in the 2nd sentence of 1.2.

I also see that this is still an open issue and is listed in 1.2. Which is fine then; I wonder whether the issue should not be listed in the intro as well, to avoid people asking questions prematurely (I began to write a comment on that until I got later down in the document:-)

(This comment actually came up at a presentation I gave essentially on this version of R2RML, I am relying it here)

At the moment, the value of rr:termtype are strings ("BlankNode", etc.). Wouldn't it be more 'Semantic Webish' if some predefined URI-s were used there? Ie, 

[] rr:termtype <URI-for-the-concept-of-blank-node>

I do not have strong feelings about this, but I though it is worth conveying to the group...

Shouldn't section 2 be labelled as informative?

Example in 2.1.2, second PredicateObjectMap: I guess it should say

[ ... ; rr:datatype xsd:positiveInteger ]

and not

[ ... ; rr:datatype "xsd:positiveInteger" ]

ie, the value is a datatype (uri) and not a string.

Section 2.2, EMP table and 2.3, LIKES Table

there are references to the empURI, graphURI, etc, as entries generated for the logical table and used everywhere as URI-s. I think it would be better to use "" everywhere, ie, include the URI scheme, rather than '' as for now. This then repeats itself in the whole of the appendix and various examples

Section 3.1, Figure 1.

I like figures in genearal, and I am also fine with that one except that... (1) it is way too huge and (2) the color scheme being used has very sharp and contrasted colours, which is very different from the colour schemes used elsewhere in the document. Can we try to make these a bit smaller and a bit more, shall we say, mild? (yes, I know, this is a matter of taste...). 

Issue (maybe to be labelled as suchy in the tracker and added to the text?): what happens exactly when, say, SubjectMapClass is missing for a TriplesMapClass instance? We may not have to answer this in the document, but label that as an issue to be solved (and I guess those are the connection point to the direct mapping!)

Similar question: what happens if the user adds more than one subjectMaps? I know there is a table at the end of the document that sets maximum cardinality for things. But there is no statements on what the error response of an R2RML processor should be if those cardinality constraints are breached. Taking into account that an R2RML instance is in RDF, we cannot rely on, say, the order within the specification (ie, something like the second coming wins). 

We may just open an issue and label it as such in the document for now, b.t.w.

3.3, figure 3: I think the figure is outdated. It uses rr:value and rr:graphValue with ValueMapClass; I guess this was part of an earlier version and and rr:template and rr:graphTemplate have replaced these. The same discrepancy holds Figures 4, 5, 6, 7

--- I am not sure what the role of table owner is. Is it some sort of a metadata? 

--- It is not clear from the text why one can have a _set_ of IRIs and blank nodes. Does it mean that all the triples in a row are, sort of, multiplied with different subjects? If this is indeed the idea, then it should be stated explicitly and maybe an example should be used in the appendix to show its usage

--- another issue is: what does a blank node mean in this respect? What is the 'scope' of that blank node, ie, which graph does it belong to? I guess it is scoped to the dafault or named graph where all the triples are put; in which case this should be explained explicitly. But see also my question below on what happens if there are several target graphs? (I guess the warning in the appendix apply...)

I think some more explanatory text is warranted here on this.

--- what happens if I have both an rr:column and an rr:subject structure in the same SubjecMapClass? Will all of them be valid and will I get all the triples, or does rr:column invalidates rr:subject? It should be stated explicitly somewhere

--- see my comment on possibly using URI-s rather than strings here... 

If strings are used: are the strings case-sensivite or case-insentisitve? Ie, is "iri" accepted or, God forbid, is "iRi" accepted?

--- same question v.a.v. sets. What does it mean if I give several graph IRI-s here?

--- and will the storage of that triple in a graph happen _additionally to_ or _instead of_ the graph storage defined for an entire row? With the knowledge that there might be no graph definition for the row, ie, the triples just go into a default graph by default...

--- and are these two mutually excusive, or can I use both? I guess the latter, but it is worth emphasizing it (here or in an introduction somewhere...)

--- I now this is a bit of a mess these days:-) yes, lang can be used for a plain literal, but I think it should also be usable if the type is set to rdf:plainLiteral, which is different (it is a datatype but has a language tag in it...)

(Note that the RDF WG coming up next year might make some order in this chaos...)

--- section heading is 'rr:graph', and the text says 'this property is similar to rr:graph property'. That is not very informative:-)

Obviously refers to the usage of rr:graph as defined elsewhere, the link is correct:-)

I am surprised that the rr:column and rr:template are not valid for RefPredicateMapClass. Any reason for that? If so, it may be worth explaining...

Section 4, column for cardinality: I do not find a min=0 cardinality very informative... I suggest to keep the max cardinality only everywhere, in which case the column header can say that, and the 'max=' string (that is not used consistently in the table) could also be dropped.

Appendix A.2: it would be nice to have a graph representation (I mean a figure) for the generated graphs. I would help me at least to grasp the results quickly, but I am a visual type...

Appendix A.2.3: I wonder whether this example (which is way simpler than the other two) brings any new aspect to the examples. If not, maybe we can drop it

Ivan Herman, W3C Semantic Web Activity Lead
mobile: +31-641044153
PGP Key:

Received on Thursday, 9 December 2010 14:26:39 UTC