Re: Re: Direct Mapping Spec - Comments

* Richard Cyganiak <richard@cyganiak.de> [2011-08-10 20:59+0100]
> On 10 Aug 2011, at 19:52, Eric Prud'hommeaux wrote:
> >> 2.3 - I realized that I wasn't sure if I was reading the spec, or reading an
> >> example. It seems to me that the text needs to be more clearly identified,
> >> on a paragraph basis as to whether it is an informal description of the spec
> >> or a concrete example. For example, the R2RML spec highlights examples with
> >> an alternate color and a surrounding label/box. Personally I think I would
> >> swap the order of sections 3 & 2 or intersperse the examples from section 2
> >> into section 3.
> > 
> > I'm sympathetic to that, but I think it needs a real work-up and
> > presentation to the WG. I would work with you on it, but I'm not
> > likely to do this on my own.
> 
> My proposal would be to gut Section 2 down to a minimum. A short example, followed by a few paragraphs that call attention to some specific corner cases (not necessarily with additional full-blown examples -- one line examples maximum). Then rename Section 2 to “The Direct Mapping by Example” or something.
> 
> >> 2.4 "It is not possible to dereference blank nodes" - I don't immediately
> >> see what the point of this statement is.
> 
> +1. Just delete that point. The goal of the document is to specify the behaviour of conforming implementations. That doesn't require this kind of commentary on the ecosystem.

That's enough support for me.
gone.

> >> 3 - "all labels are generated by appending to a base." - I think someone
> >> else mentioned this already, but it seems referring to the IRIs as "labels"
> >> is confusing and we should use more precise words here.
> > 
> > But I'm searching for *less* precise words here.
> > 
> > <http://www.w3.org/TR/rdf-concepts/#section-data-model> says that RDF
> > is a graph. The fact that it has directional arrows with labels means
> > it's a Directed. Labeled Graph. The fact that the concepts doc doesn't
> > mention the labels on the nodes (it simply says they exists) is a
> > minor pain in my butt. Bringing the construction of these nodes into
> > the domain of discourse makes it difficult to not discuss the fact
> > that they do in fact provide the identifiers which adorn the graph.
> > 
> > <http://www.w3.org/TR/rdf-concepts/#section-URI-Vocabulary> implies
> > that DLG labels are called "names" in the RDF world:
> > [[
> > A blank node is a node that is not a URI reference or a literal. In
> > the RDF abstract syntax, a blank node is just a unique node that can
> > be used in one or more RDF statements, but has no intrinsic name.
> > ]]
> > Should I use that?
> 
> I strongly suggest sticking to the normative terminology in Section 6 of RDF Concepts, and forgetting about anything else.
> 
> Yes, RDF is, in mathematical terms, a directed labelled graph. But that's totally irrelevant in this context because the DM document doesn't deal with graph theory, it deals with constructing *RDF graphs*, and formally speaking, there are no “labels” or “names” in RDF graphs.
> 
> Everyone benefits if language is used consistently throughout the entire “RDF house”.

Sure, but the conversation has been all about what not to
say rather than what to say. Fortunately I believe this out works:
[[
for the purposes of this specification, all IRIs are generated by
appending to a base
]]
("appending" vs. "resolved against" is the subject of another thread.)

> >> 3 - "the percent-encoded form of the column value" - This presupposes a text
> >> representation of the column value. Is it specified elsewhere how to get a
> >> text representation?
> > 
> > Good point, same is true for the column names etc.
> > 
> > SQL gives us a unicode version of each table or column name, as well
> > as the values (for e.g. equivalence testing). We need to work out the
> > related wording for both R2RML and this doc.
> 
> This is related to ISSUE-29 in R2RML and I have an open action to write some text for that.
> 
> >> 3 - "fresh blank node" - Personally, seems ok to me, but do we need more
> >> precise words for this?
> > 
> > I think "fresh" is the term of art in computer science, but maybe
> > there's something closer to the hearts of RDF modelers.
> 
> I previously proposed “fresh blank node that is unique to this row”. Feels like further improvement is possible.

That's the text in there now (Revision 1.4 2011/08/05 14:26:41). The
row scopes it nicely but I believe David is asking if "fresh" does the
job. (Do I create a new one or do I just freshen up an old one?)


> >> 3 - "A (potentially unary)" - I encountered several places like this where I
> >> found the parens distracting.
> > 
> > Ditto Michael, but I'm pretty sure this is the text that minimizes the
> > opportunities for misinterpretation.
> 
> A list of one or more column names?

some options:
  • A (potentially unary) list of column names in a table form a property IRI
  • A list of column names in a table form a property IRI
  • A potentially unary list of column names in a table form a property IRI
  • A list of one or more column names in a table form a property IRI
  • A <column name list> from a table form a property IRI

Note that SQL uses the term <column name list> in all the places we
need to use it (specifically, in foreign keys).

> >> A.1 - I think the English Syntax should be shown by default.
> > 
> > anyone want to second this?
> 
> Why are the buttons there in the first place? As a reader, I expect the authors to decide on an appropriate way of presenting the document. You abdicate that responsibility and instead force the reader to figure out which version they want to see.

Many people will find them useful, as many have found the buttons in
<http://www.w3.org/TR/2009/REC-owl2-primer-20091027/#OWL_Syntaxes>
very useful.

Why do you dislike them? Do you propose that the set builder syntax
and the set syntax be in separate sections even though they're
parallel constructions?


> Perhaps consider removing the English Syntax in Appendix A altogether? The goal is no longer to make this the normative version, so we don't need to set such a high bar on readability. I assume that the target audience of this section can handle the raw notation without additional explanation?

Hmm, doesn't exactly look like a second to me.


> >> * do we need to say anything about how a direct mapping generator finds a
> >> database?
> > 
> > I think protocol and parameters are best left to tools and specs which
> > use the direct graph.
> 
> +1
> 
> >> * I notice the spec is silent about case sensitivity of database
> >> identifiers. I suppose it is implied that the casing used in the database
> >> metadata is preserved?
> > 
> > I'd say that every way you can hand a schema to a DM tool will involve
> > some serialization and that will have one of ("FNAME", "fname",
> > "Fname", "fNaMe"...). I don't see sensitivity being an issue yet.
> 
> In some cases, "fname" and "FNAME" might refer to the same thing. In other cases they don't.
> 
> Would an implementation that decides to lowercase everything be conforming? If not, then case-sensitivity is an issue, right?

Well, it's still a direct graph, just for a different
database. Likewise if they rot13'd the names. SQL implementations have
a litany of rules around what can be considered equivalent, meaning
that query-mapping implementations exposing the direct graph may
effectively expose multiple direct graphs ({<PEOPLE/ID=7> ?p ?o}
happen to match the same data as {<People/ID=7> ?p ?o}) but I think
we're best off just defining the one that naturally falls out of
whatever schema you've passed to the DM implementation.

Linked-data-style GET interfaces would be even weirder if we offered
case-sensitivity as the naive implementation would respond to GET
<PEOPLE/ID=7> with a graph about <People/ID=7> and the nuanced
implementation would either rely on, in this case, 2^^5 owl:sameAs's
or 2^^6*(column count + key count) assertions.

R2RML offers some advice that there may be other graphs which match a
database in the form of an SQL description, but doesn't talk about
when you are and aren't allowed to offer those graphs. (All the
delimiter-sensitive case rules go out the window when you answer
SPARQL queries without the capacity to tranfer the quotes.)

So far, we have a spec for a graph which doesn't involve
implementations or conformance or error conditions or any of that.
A conformance section would read something like:
[[
See Section 3 for the definition of the Direct Graph.
]]


> Best,
> Richard

-- 
-ericP

Received on Wednesday, 10 August 2011 20:58:23 UTC