Re: R2RML practicability concerns from Harry Halpin on 2010-10-07 (public-rdb2rdf-wg@w3.org from October 2010)

From: Harry Halpin <hhalpin@w3.org>
Date: Thu, 7 Oct 2010 16:23:29 +0100 (BST)
To: "Pat Hayes" <phayes@ihmc.us>
Cc: "RDB2RDF WG" <public-rdb2rdf-wg@w3.org>, "Souri Das" <souripriya.das@oracle.com>
Message-ID: <7bbed1391eca27c961a0de7eda439a1a.squirrel@webmail-mit.w3.org>
> Dear all
>
> I realize that I am coming to this effort rather late and completely
'cold', as it were, but perhaps my reactions to the text will be of use
partly because of this, in that I can offer a perspective more like that
of the general reader who is wanting to find out about RDB2RDF without
having the advantage of having lived through the (no doubt onerous)
process of helping invent it.

Pat - give us a bit of time to catch you up :) I think we are in process
of booking a telecon with you, Marcelo, and Eric.

>
> My first reaction was almost complete puzzlement. I take it that the
basic
> idea here is to explain how RDB data will get transformed or mapped into
RDF; but the draft does not describe any such mapping or transformation.
It does not give a single example of what the transformation would be
like. (The example at the end gives some RDB tables and some rather
strange RDF, but this RDF is not the RDF transformation of those tables
(I
> presume?)). Surely, the document should first explain what this RDB to
RDF
> mapping actually is, perhaps informally, with a few simple examples,
before starting on the process of giving the RDF encoding of R2RML. (In
case you think that it should be obvious: it isn't, because there are
many
> ways to encode RDB tables in RDF, and you have presumably chosen one of
them.  For example, does the mapping include any way to encode, in the
resulting RDF, any information about unique name assumptions or
> information closure, or about keys, in the RDB source?)

Souri has done this in the wiki, and I imagine these examples and
motivating text will be moved into the document:


And the answer to all the above questions is yes I think re keys.

>
> My second puzzlement (which I expressed in the telecon) is about the use
of RDF to describe the mapping itself. I presume this choice (of RDF as
the mapping metadescription) was made by the WG after some careful
thought, but it is a very idiosyncratic and (to me, at any rate)
surprising decision, and the text could usefully spend a little time
explaining that this is what is being done, rather than simply embarking
on a detailed account of the RDF meta-vocabulary without any background
or
> introduction. At the very least, it would be helpful to the reader to
say
> that RDF is here playing two rather different roles, which should not be
confused: it is the target language of the mapping, and it is also being
used to describe the mapping.

Again, that distinction should be made up-front in the spec, but it does
not seem to confuse D2RQ users. Of course, we could make a *custom* syntax
for the mapping language, but that was the least popular alternative
amongst the group members.

It's a choice between RDF and XML, and Turtle was viewed as the lesser of
two evils syntactically, but the group time-permitting will also make an
XML version of the syntax of the mapping language.

I'd say give the editors more time. I imagine the combination of a default
mapping that can then be modified (so that the creators of the mapping
language to explain *every* detail of the mapping) and declare the type of
every triple should be more readable.

And Pat, you were brought in for semantic mediation, not syntax :) But
thanks for the pointers!

>
> Third, the text is in places extremely unclear, not to say muddled.
>
> For example (Section 3.1): "The RDFTermMap class reprersents the
description of mapping to any RDF term"
>
> Questions:
>
> 1. How can an RDF class 'represent' anything? (Do you mean that the
elements of the class are these things? If so, say so explicitly. If
not,
> what do you mean?)
> 2. Are the elements of this class *descriptions* of mappings? You appear
to say this: but it they are descriptions, they must be linguistic in
nature, ie expressions of some descriptive language. What language?? Or
did you mean that the elements are the actual mappings?
> 3. The mapping is "to any RDF term". What does this mean? A single
mapping
> will not be to *any* RDF term, presumably (?) But ignoring that
> grammatical issue, what is this mapping from, that its value is a single
RDF term? Surely not a RDB table or column or row, all of which must map
to something larger than a single term (Unless, possibly, this term is
being used in some strange way to encode a large amount of information,
as
> you use plain literals to encode SQL?). Overall, this one sentence is so
opaque and so puzzling that it endangers ones ability to understand
almost
> all the rest of the document.
>
> When presenting a new class name in RDF, the appropriate documentation
is
> a clearly stated specification of what the elements of the class are
intended to be. Nothing more need, or should, be said about the class.
This is not done here, or anywhere in the document, for the central
class
> RDFTermMap. I still have only the vaguest notion of what these things
are
> supposed to actually be. What are they mappings *from*, for just one
vitally important but unanswered question.
>

We are doing a lot more with text literals than most people.
> Moving on:
>
> "This has two main components: mapping to an RDF property and mapping to
an object value (to be associated with the property)."
>
> Questions.
>
> 1. What exactly has two components? (I am presuming it is the "mapping
to
> any RDF term", though this is not clear from the text.)
> 2. Why would a mapping to a single RDF term have two components? Surely
there isn't anything to subdivide in a single RDF term. 3. Why would a
mapping to an RDF term involve a mapping to a property? (There seems to
be
> a category mistake here. "RDF term" refers to a terminal in the RDF
grammar, while "RDF property" refers to a role. So a given IRI is an RDF
term but it can also be an RDF property, or not, depending on where in a
triple it occurs.)
>
> I could go on, but almost every line would have similar comments. Sorry
to
> be unhelpful at this stage, but the document as it stands really is not
ready for public release in anything like its present form.
>
> Pat Hayes
>
>
> On Oct 5, 2010, at 6:15 PM, Sören Auer wrote:
>
>> Dear all,
>> unfortunately there was not time today during the telco to raise this
concern, that is why now by email:
>> When looking at the example I notice, that the relational tables
definition would be very concise (~15 lines). The R2RML mapping,
however, is very verbose and takes probably 5 times more space. I'm
really afraid, that R2RML will be very impractical and has a quite
steep learning curve. Even if you have user interfaces which automatize
the generation of R2RML, these will have to be understood and modified
manually as soon as the DB schema changes. From that perspective, the
current draft appears to be quite impractical.
>> Suggestion: do you think it would be possible to follow a convention
over configuration approach and only require the user to configure
something in case he wants to alter the default behaviour. For example,
an rr:Table2TriplesMap based on an rr:logicalTable could be mapped
based
>> on reasonable assumptions and maybe a default mapping of DB datatypes
to
>> XML-Schema datatypes, instead of having to configure every
>> rr:propertyObjectMaps in addition for every column.
>> I think simplifying things is really crucial, if we want the standard
to
>> be quickly and widely adopted.
>> Best,
>> Sören
>
> ------------------------------------------------------------
> IHMC                                     (850)434 8903 or (650)494 3973
40 South Alcaniz St.           (850)202 4416   office
> Pensacola                            (850)202 4440   fax
> FL 32502                              (850)291 0667   mobile
> phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
>
>
>
>
>
>
>
Received on Thursday, 7 October 2010 15:23:32 UTC