Review of Two Alternative Direct/Default graph docs from Harry Halpin on 2010-11-08 (public-rdb2rdf-wg@w3.org from November 2010)

From: Harry Halpin <hhalpin@w3.org>
Date: Mon, 8 Nov 2010 23:10:34 -0000 (GMT)
To: public-rdb2rdf-wg@w3.org
Message-ID: <50f58200c8538ad60e4b89234b2ec3ec.squirrel@webmail-mit.w3.org>
Here's a quick review in a purely personal capacity:

1) Looking at this document [1] from Eric primarily.

- It's very concise. I generally feel like I have understood the zeitgeist
of the algorithm by looking at the examples in Section 2, but I'm not
always confident of what's going on till I look at Section 4-5. It is also
not much changed from the last look I had it a few months ago.

- There are, as someone else pointed out, always issues encoding things as
NCNames in IRIs. I know most people just usually use a simple library call
for this, but it would be good to point out precisely where/what this
algorithm is. We need a better way to describe url-encoding of column/row
strings  to make sure DB implementers than just referring to this IMHO
[3].

- The real action (and heart of the document) seems to be in the textual
description in 2.2 and then the how to form IRIs immediately thereafter.
Overall, the presentation should probably present the *standard* cases
before presenting the edge cases. In this case, presenting literal triple
case before the reference triple case is a bit odd - talk about the case
of the URI being a primary key first, then not having a primary key. It's
just hard to follow the English here.

-  Then the IRI formation rules in English seem off from the examples. The
algorithm  seems to suggest that a "hash" be added in between column names
and stem in predicate IRIs, where the algorithm has a slash before the
column names. This is weird because the subject/object IRIs use a slash
between their stem and colum names, and then add another "#_" to end of
subject/object URIs.    While I see the issue flagged, we should not have
a difference between these two cases and divergence between English text
and examples.

- I'm going to ignore comments on Section 3-5 for now, will post later
once we agree on text.

2) Looking at this document [2] from Juan and Marcelo primarily

- Main impression is that it's more text. For me, I find the section 2.3
of this document is much easier to follow than section 2.2 of the other
document [1].

- However, there's some major differences. Unlike [1], the IRI
construction rules in English seem to line up with their examples.
However, unlike [1] they don't use "#"s and use ',' rather '_' between
column names. I tend to say '_' makes more sense, but I still am confused
about the '#' vs. '\' difference. It seems we should just pick the
simplest pattern (i.e. '\' and no fragment ids) and stick with it unless
there's a real reason to use #/fragids.  Also, document [2]  seems to be
generating rdf:type triples for Table IRIs, which the first document
doesn't do [1].

- The examples beneath their section 2.3 seem about the same, but there
seems to be one missing, the hierarchical tables approach. I remember
discussing this on telecons but cannot remember the resolution. Why the
discrepancy in examples?

- Ignoring their Section 3 for now, noting their section 4-6 is just cut
and paste of [1]'s section 3-5.

Both documents:

- Both documents embody the same underlying algorithm it appears with one
exception, i.e. the example given by the hierarchical tables (which we
need to decide if we rule out of not) and some oddness about using '_' vs
',' and '#' vs '\' in IRI formation. So it's really a matter of
readability and clarity to implementers and database admins who will use
this, so merging the documents should be no big deal once we get some
agreement on these trivial things. Overall, I prefer the presentation of
the IRI algorithms in document [2]. This is very important: the English
needs to precisely detail the algorithm needed to construct IRIs, chose
among cases, etc. as much as possible, and catch things implemeneters may
forget, like url-encoding of text. In Document [1] the English doesn't
line up well with examples. However, I like the layout of the examples in
document [1] more,, although would like to add some more explanatory text
from [2]. Would like "english rules" then followed by examples in
sub-sections in increasing order of complexity.

- In both documents, there is a severe disconnect between the examples and
the formal rules, which makes it hard to connect the examples back to the
IRI construction algorithms in both documents. I suggest that in Section 2
(whichever document) that the rules be presented right after their English
example, and each example point out exactly which rules (Scala-style or
Datalog) it's using, and then at end state the rules in their full
formality if necessary (which for the default document, it may not be
necessary to do if there's a separate semantics document). While I
understand there is some desire to separate normative from informative
material, in today's world people will likely ignore all
"semantics/formalism" if it's at end of document and just code according
to the examples. Having some shorthand to connect each example to each
rule both in English and in some formal notation is necessary.

- While I understand lots has changed, it would be better if we used the
same running example in R2RML [3] and both direct graph documents. I guess
that would mean more work, but would make documents flow much better
together.

[1]http://www.w3.org/2001/sw/rdb2rdf/directGraph/
[2]http://www.w3.org/2001/sw/rdb2rdf/directGraph/alt
[3] http://www.w3.org/TR/wsdl#_http:urlEncoded
Received on Monday, 8 November 2010 23:10:36 UTC