Re: R2RML practicability concerns from Pat Hayes on 2010-10-06 (public-rdb2rdf-wg@w3.org from October 2010)

From: Pat Hayes <phayes@ihmc.us>
Date: Wed, 6 Oct 2010 15:16:36 -0500
To: RDB2RDF WG <public-rdb2rdf-wg@w3.org>, Souri Das <Souripriya.Das@oracle.com>
Message-Id: <792D92B4-EA77-42ED-A437-BD2077B92C81@ihmc.us>
Dear all

I realize that I am coming to this effort rather late and completely 'cold', as it were, but perhaps my reactions to the text will be of use partly because of this, in that I can offer a perspective more like that of the general reader who is wanting to find out about RDB2RDF without having the advantage of having lived through the (no doubt onerous) process of helping invent it. 

My first reaction was almost complete puzzlement. I take it that the basic idea here is to explain how RDB data will get transformed or mapped into RDF; but the draft does not describe any such mapping or transformation. It does not give a single example of what the transformation would be like. (The example at the end gives some RDB tables and some rather strange RDF, but this RDF is not the RDF transformation of those tables (I presume?)). Surely, the document should first explain what this RDB to RDF mapping actually is, perhaps informally, with a few simple examples, before starting on the process of giving the RDF encoding of R2RML. (In case you think that it should be obvious: it isn't, because there are many ways to encode RDB tables in RDF, and you have presumably chosen one of them.  For example, does the mapping include any way to encode, in the resulting RDF, any information about unique name assumptions or information closure, or about keys, in the RDB source?)

My second puzzlement (which I expressed in the telecon) is about the use of RDF to describe the mapping itself. I presume this choice (of RDF as the mapping metadescription) was made by the WG after some careful thought, but it is a very idiosyncratic and (to me, at any rate) surprising decision, and the text could usefully spend a little time explaining that this is what is being done, rather than simply embarking on a detailed account of the RDF meta-vocabulary without any background or introduction. At the very least, it would be helpful to the reader to say that RDF is here playing two rather different roles, which should not be confused: it is the target language of the mapping, and it is also being used to describe the mapping. 

Third, the text is in places extremely unclear, not to say muddled. 

For example (Section 3.1): "The RDFTermMap class reprersents the description of mapping to any RDF term"  

Questions: 

1. How can an RDF class 'represent' anything? (Do you mean that the elements of the class are these things? If so, say so explicitly. If not, what do you mean?) 
2. Are the elements of this class *descriptions* of mappings? You appear to say this: but it they are descriptions, they must be linguistic in nature, ie expressions of some descriptive language. What language?? Or did you mean that the elements are the actual mappings? 
3. The mapping is "to any RDF term". What does this mean? A single mapping will not be to *any* RDF term, presumably (?) But ignoring that grammatical issue, what is this mapping from, that its value is a single RDF term? Surely not a RDB table or column or row, all of which must map to something larger than a single term (Unless, possibly, this term is being used in some strange way to encode a large amount of information, as you use plain literals to encode SQL?). Overall, this one sentence is so opaque and so puzzling that it endangers ones ability to understand almost all the rest of the document. 

When presenting a new class name in RDF, the appropriate documentation is a clearly stated specification of what the elements of the class are intended to be. Nothing more need, or should, be said about the class. This is not done here, or anywhere in the document, for the central class RDFTermMap. I still have only the vaguest notion of what these things are supposed to actually be. What are they mappings *from*, for just one vitally important but unanswered question. 

Moving on:

"This has two main components: mapping to an RDF property and mapping to an object value (to be associated with the property)."

Questions. 

1. What exactly has two components? (I am presuming it is the "mapping to any RDF term", though this is not clear from the text.) 
2. Why would a mapping to a single RDF term have two components? Surely there isn't anything to subdivide in a single RDF term. 3. Why would a mapping to an RDF term involve a mapping to a property? (There seems to be a category mistake here. "RDF term" refers to a terminal in the RDF grammar, while "RDF property" refers to a role. So a given IRI is an RDF term but it can also be an RDF property, or not, depending on where in a triple it occurs.)

I could go on, but almost every line would have similar comments. Sorry to be unhelpful at this stage, but the document as it stands really is not ready for public release in anything like its present form.

Pat Hayes


On Oct 5, 2010, at 6:15 PM, Sören Auer wrote:

> Dear all,
> 
> unfortunately there was not time today during the telco to raise this concern, that is why now by email:
> 
> When looking at the example I notice, that the relational tables definition would be very concise (~15 lines). The R2RML mapping, however, is very verbose and takes probably 5 times more space.
> 
> I'm really afraid, that R2RML will be very impractical and has a quite steep learning curve. Even if you have user interfaces which automatize the generation of R2RML, these will have to be understood and modified manually as soon as the DB schema changes. From that perspective, the current draft appears to be quite impractical.
> 
> Suggestion: do you think it would be possible to follow a convention over configuration approach and only require the user to configure something in case he wants to alter the default behaviour. For example, an rr:Table2TriplesMap based on an rr:logicalTable could be mapped based on reasonable assumptions and maybe a default mapping of DB datatypes to XML-Schema datatypes, instead of having to configure every rr:propertyObjectMaps in addition for every column.
> 
> I think simplifying things is really crucial, if we want the standard to be quickly and widely adopted.
> 
> Best,
> 
> Sören
> 
> 

------------------------------------------------------------
IHMC                                     (850)434 8903 or (650)494 3973   
40 South Alcaniz St.           (850)202 4416   office
Pensacola                            (850)202 4440   fax
FL 32502                              (850)291 0667   mobile
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Wednesday, 6 October 2010 20:17:13 UTC