Review of Direct Mapping Document from Harry Halpin on 2010-12-12 (public-rdb2rdf-wg@w3.org from December 2010)

From: Harry Halpin <hhalpin@w3.org>
Date: Sun, 12 Dec 2010 22:05:53 -0000 (GMT)
To: public-rdb2rdf-wg@w3.org
Message-ID: <75108df3e118ca2d40de58ef1cda29e2.squirrel@webmail-mit.w3.org>
While I was at it, I also reviewed the Direct Mapping Dcoument. Again,
comments are mine only, not W3C's, as Ivan/Eric are now staff contacts.

Comments, roughly in order of appearance in document

1) Abstract. I'd mention R2RML as one way to do "refinements"

s/more intricate/custom

remove "and a formal" as we don't have that agreed upon yet, but add in
back in when we do.

2) Why do URIs end in "#_?" There is also a problem with how this works in
the IRI construction algorithm, as you can get IRIs
"baseIRIR/table_name#column_name#_". That I'm pretty sure is not a legal
URI, that's a fragid of a fragid. See "Single-column IRI" in 2.2.

3) If we choose to keep the ending with # (thus making it an OK URI for
RDF by W3C TAG and in common RDF style), why is there an extra "_"
underscore added? While these URIs could resolve to text/html, that's
unlikely at best. The extra '_' only makes things more confusing, as
that's not common practice in RDF.

4) Issue (has -vs-slash): While I hope we can preserve ending with "#" and
thus put something else between table_names and column_names (maybe put
the '_' there instead?), let's keep it "example.org/ex#" rather than 
"example.org/ex#_"

5) Issue primary-is-candidate-key: Given that we should keep this
algorithm simple and direct, and the proposed changes try to guess some
things about the structure of the intended RDF, I think we should just
ignore the fact that a primary key is also a candidate key and run the
algorithm over the tables as normal. Then, if people want to do more
complex modelling, they can then use RIF or whatever on top of the
resulting RDF.

6) Issue hier-table-at-risk. See 5).

7) Issue fk-pk-order. Not sure how this should be handled, but my
temptation would be to say see 5)

8) Issue many-to-many-as-repeated properties. Again, see 5)

9) Issue formalism-model: This has been quite the debate, and I think we
should de-link the semantics from both Direct Mapping and R2RML, and put
them in a separate document. Second, I think a stringent requirement on
any formalism should be able of handing both Direct Mapping and R2RML, as
otherwise we have the situation where possibly incompatible semantics
model two different docs. I haven't seen a candidate that does scale to
both R2RML and direct mapping.

More on motherhood and apple-pie, but formal semantics in general has to
involve an interpretation function from some formal definition of syntax
(usually done with a BNF) to a mathematical structure in Tarski-style
semantics (and so constrains infererence) or directly specifies allowed
inferences in proof-theoretic (Gentzen) style semantics.

 On a high level, it appears both Section 3 and Section 5 are doing the
same thing, and until I see test-cases that show otherwise, I think
they're basically compatible...whether one likes the functional way
z=f(x,y) or one prefers the rule way f(x,y,z) doesn't really matter.

What it appears that we have in Section 3 is well-defined BNF (i.e. how
people usually define syntax precisely) where the production rules involve
variables whose are derived from the table. While it definitely completely
specifies the problem by virtue of directly specifying what one should
code via a set-theoretic take on some working Scala code, we should not
tell direct implementers on that level of detail.

However, in Section 4 we have some rules in what people would think is
first-order logic (a variant thereof, Datalog). However, without direct
reference to the R2RL semantics, it's not a formal semantics [1].
Therefore, these should be in the same document. Second, functions like
genreateColumnIRI and whatnot should be specified on a more low level,
i.e. give a direct rule for "baseIRI+blah blah+#" in form of a string
concat construction.

So, something that involves more precise instructions that Section 4 but
not as precise as Section 3 is what is needed.

I think that the best reason for a formal semantics should be to help
implementers, and giving the implementers a finite list of functions or
rules they have to check they implement is a good way to do it. These
descriptions should be precise when necessary, but also allow variation in
coding and be usable across different programming paradigms.

10) We should provide some more guidance on what to do with the Direct
Mapping once you've produced a bunch of RDF from some tables directly. The
general story people were telling was to use RIF/some RDF rule
language/SPARQL to transform them. I think we should tell people about
this option in some way, but leave the specifics to another document
(perhaps a Working Group note with some examples).

[1] http://www.w3.org/2001/sw/rdb2rdf/wiki/Semantics_of_R2RML
Received on Sunday, 12 December 2010 22:05:55 UTC