Re: Direct Mapping Spec - Comments from Richard Cyganiak on 2011-08-10 (public-rdb2rdf-wg@w3.org from August 2011)

From: Richard Cyganiak <richard@cyganiak.de>
Date: Wed, 10 Aug 2011 20:59:01 +0100
To: Eric Prud'hommeaux <eric@w3.org>
Cc: David McNeil <dmcneil@revelytix.com>, RDB2RDF WG <public-rdb2rdf-wg@w3.org>
Message-Id: <F7AEC226-0FB8-4E48-A8E2-A05C819EB6CB@cyganiak.de>
On 10 Aug 2011, at 19:52, Eric Prud'hommeaux wrote:
>> 2.3 - I realized that I wasn't sure if I was reading the spec, or reading an
>> example. It seems to me that the text needs to be more clearly identified,
>> on a paragraph basis as to whether it is an informal description of the spec
>> or a concrete example. For example, the R2RML spec highlights examples with
>> an alternate color and a surrounding label/box. Personally I think I would
>> swap the order of sections 3 & 2 or intersperse the examples from section 2
>> into section 3.
> 
> I'm sympathetic to that, but I think it needs a real work-up and
> presentation to the WG. I would work with you on it, but I'm not
> likely to do this on my own.

My proposal would be to gut Section 2 down to a minimum. A short example, followed by a few paragraphs that call attention to some specific corner cases (not necessarily with additional full-blown examples -- one line examples maximum). Then rename Section 2 to “The Direct Mapping by Example” or something.

>> 2.4 "It is not possible to dereference blank nodes" - I don't immediately
>> see what the point of this statement is.

+1. Just delete that point. The goal of the document is to specify the behaviour of conforming implementations. That doesn't require this kind of commentary on the ecosystem.

>> 3 - "all labels are generated by appending to a base." - I think someone
>> else mentioned this already, but it seems referring to the IRIs as "labels"
>> is confusing and we should use more precise words here.
> 
> But I'm searching for *less* precise words here.
> 
> <http://www.w3.org/TR/rdf-concepts/#section-data-model> says that RDF
> is a graph. The fact that it has directional arrows with labels means
> it's a Directed. Labeled Graph. The fact that the concepts doc doesn't
> mention the labels on the nodes (it simply says they exists) is a
> minor pain in my butt. Bringing the construction of these nodes into
> the domain of discourse makes it difficult to not discuss the fact
> that they do in fact provide the identifiers which adorn the graph.
> 
> <http://www.w3.org/TR/rdf-concepts/#section-URI-Vocabulary> implies
> that DLG labels are called "names" in the RDF world:
> [[
> A blank node is a node that is not a URI reference or a literal. In
> the RDF abstract syntax, a blank node is just a unique node that can
> be used in one or more RDF statements, but has no intrinsic name.
> ]]
> Should I use that?

I strongly suggest sticking to the normative terminology in Section 6 of RDF Concepts, and forgetting about anything else.

Yes, RDF is, in mathematical terms, a directed labelled graph. But that's totally irrelevant in this context because the DM document doesn't deal with graph theory, it deals with constructing *RDF graphs*, and formally speaking, there are no “labels” or “names” in RDF graphs.

Everyone benefits if language is used consistently throughout the entire “RDF house”.

>> 3 - "the percent-encoded form of the column value" - This presupposes a text
>> representation of the column value. Is it specified elsewhere how to get a
>> text representation?
> 
> Good point, same is true for the column names etc.
> 
> SQL gives us a unicode version of each table or column name, as well
> as the values (for e.g. equivalence testing). We need to work out the
> related wording for both R2RML and this doc.

This is related to ISSUE-29 in R2RML and I have an open action to write some text for that.

>> 3 - "fresh blank node" - Personally, seems ok to me, but do we need more
>> precise words for this?
> 
> I think "fresh" is the term of art in computer science, but maybe
> there's something closer to the hearts of RDF modelers.

I previously proposed “fresh blank node that is unique to this row”. Feels like further improvement is possible.

>> 3 - "A (potentially unary)" - I encountered several places like this where I
>> found the parens distracting.
> 
> Ditto Michael, but I'm pretty sure this is the text that minimizes the
> opportunities for misinterpretation.

A list of one or more column names?

>> A.1 - I think the English Syntax should be shown by default.
> 
> anyone want to second this?

Why are the buttons there in the first place? As a reader, I expect the authors to decide on an appropriate way of presenting the document. You abdicate that responsibility and instead force the reader to figure out which version they want to see.

Perhaps consider removing the English Syntax in Appendix A altogether? The goal is no longer to make this the normative version, so we don't need to set such a high bar on readability. I assume that the target audience of this section can handle the raw notation without additional explanation?

>> * do we need to say anything about how a direct mapping generator finds a
>> database?
> 
> I think protocol and parameters are best left to tools and specs which
> use the direct graph.

+1

>> * I notice the spec is silent about case sensitivity of database
>> identifiers. I suppose it is implied that the casing used in the database
>> metadata is preserved?
> 
> I'd say that every way you can hand a schema to a DM tool will involve
> some serialization and that will have one of ("FNAME", "fname",
> "Fname", "fNaMe"...). I don't see sensitivity being an issue yet.

In some cases, "fname" and "FNAME" might refer to the same thing. In other cases they don't.

Would an implementation that decides to lowercase everything be conforming? If not, then case-sensitivity is an issue, right?

Best,
Richard
Received on Wednesday, 10 August 2011 19:59:32 UTC