Re: Comments on Eric's Section 2 from Eric Prud'hommeaux on 2010-11-08 (public-rdb2rdf-wg@w3.org from November 2010)

From: Eric Prud'hommeaux <eric@w3.org>
Date: Mon, 8 Nov 2010 12:18:40 -0500
To: Richard Cyganiak <richard@cyganiak.de>
Cc: RDB2RDF WG <public-rdb2rdf-wg@w3.org>
Message-ID: <20101108171839.GC11301@w3.org>
* Richard Cyganiak <richard@cyganiak.de> [2010-11-07 12:13+0800]
> All,
> 
> I'm travelling and a few days behind the latest RDB2RDF news and
> continue to be baffled by events, especially the decision by Ashok
> and Thomas to abandon work on Eric's version of the direct mapping
> document in favour of the Juan/Marcelo version.
> 
> I had a checkout of Eric's version and reviewed it while on the
> plane, which now apparently was a waste of time, but I'll share the
> comments anyway.
> 
> Having read both documents, I think that Eric's is better written,
> gets the same information across in a more concise and accurate way,
> and has just sufficient examples to make everything clear. It deals
> with corner cases that are not addressed in the /alt version.
> Altogether I think that it's superior to the /alt document. I still
> don't understand why Juan and Marcelo have forked the document in
> the first place, but seriously I don't think that their changes have
> led to a superior Section 2 -- their version simply says the same
> things in a generally harder-to-digest style in more words.
> 
> For the record: If the issues that I list below can be addressed,
> along with the three from my other email I sent earlier, then I
> support publication of an FPWD that consists of:
> 
> - Eric's sections 1 and 2
> - followed by Eric's set semantics based formal approach
> - and Juan/Marcelo's datalog based formal approach
> - with an issue box explaining that both of these are
> work-in-progress candidates for the formal semantics.

I wonder if we can get more value from J&Ms work by merging in their
expositions of e.g. the created IRIs and justifications for individual
triples. Marcelo and I geeked a bit last Thursday about a way that
would allow folks who want the detail to expand the relevent sections;
I think we could create a proposal pretty quickly.


> And that's the last thing I intend to say about the direct mapping
> thingy until the three editors have managed to present the WG with a
> single version of the document endorsed by all of them.
> 
> Best,
> Richard
> 
> 
> Comments on Eric's draft
> 
> 1. Section 2.1 is IMHO unnecessary and confuses more than it helps.
> I would move its first two sentences into the Introduction, and
> remove the rest, in particular the SPARQL example. The same goes for
> the SPARQL example in 2.4, I would remove it. SPARQL query
> evaluation is a completely different topic and requires a ton of
> knowledge that is not essential for understanding the default
> mapping, so I honestly don't see how this helps the average reader.
> 
> 2. Section 2.2: The predicate for reference triples is described as:
> “an IRI composed of the stem, table name and column name and value
> for each column in the foreign key”. I don't understand why it says
> “and value”? The object is described as: “the subject created for
> the referred triple”. Do you mean “referenced row”?
> 
> 3. Please provide a rationale for the “#_” at the end of generated
> IRIs in the text. In my opinion, this is entirely unnecessary and a
> useless complication. I see there is an issue box for that in the
> document, that's great, but if you want to have the “#_” thing in
> the FPWD then there should be text stating why it is necessary. My
> proposal for FPWD would be to s/#_//g and state in the issue box
> that this is subject to more discussion.
> 
> 4. Inconsistency: Section 2.2 states that predicate IRIs have
> hashes, while all the examples have slashes.
> 
> 5. You should define the terms “row IRI” or “row identifier” and
> “column IRI”, and use them throughout, instead of saying sloppy
> things like “a IRI composed of the stem, table name and column name”
> or “the subject of the referenced row”. I think this is done pretty
> well in the directGraph/alt draft.
> 
> 6. Why a reference to [SQL99]? I thought we had agreed to use SQL
> Core 2008? You can copy the reference from the R2RML draft.
> 
> 7. Both “URI” and “IRI” are used. I suppose it should be “IRI”
> everywhere?
> 
> 8. In order to have an improved narrative in the section titles, I
> propose splitting 2.2 into one section “Identifiers for rows and
> columns” and one section “Row mapping rules”. (Not essential for
> FPWD)
> 
> 9. Section 2.5: “Hierarchies” can refer to many things in an SQL
> context, so it's a bit hard to figure out what the section refers
> to. The first sentence should perhaps talk about “hierarchies of
> tables that represent specializations of the same concept” or
> something similar. The People table should perhaps be removed from
> the example, because it is not relevant to the example and makes
> understanding the relevant parts of the example harder.
> 
> 10. Given that the question of many-to-many table mappings is an
> open issue, there should be at least a section about it that is
> empty except for an issue box. (I have more to say on this topic,
> but don't expect that discussion to be resolved before FPWD)
> 
> 11. See my comments to Juan and Marcelo asking for inclusion of
> table IRIs and of a triple that associates each row to its table.
> I'd really like to see a proposal for this in the FPWD, but at least
> an issue box would be essential. I note that the directGraph/alt
> version already has this.
> 
> 

-- 
-ericP
Received on Monday, 8 November 2010 17:19:17 UTC