Re: Target audience of the Direct Mapping document? from Alexandre Bertails on 2011-07-10 (public-rdb2rdf-wg@w3.org from July 2011)

From: Alexandre Bertails <bertails@w3.org>
Date: Sat, 09 Jul 2011 20:33:39 -0400
To: Richard Cyganiak <richard@cyganiak.de>
Cc: Eric Prud'hommeaux <eric@w3.org>, W3C RDB2RDF <public-rdb2rdf-wg@w3.org>
Message-ID: <1310258019.10618.10.camel@simplet>
Richard,

first of all, I want to make sure we both speak about the same stuff,
that is the Section 3. I don't say *anything* about the rule approach.

On Sat, 2011-07-09 at 19:26 +0100, Richard Cyganiak wrote: 
> Hi Alexandre,
> 
> On 8 Jul 2011, at 21:37, Alexandre Bertails wrote:
> > This "formal stuff" is actually very accessible, both the formalism
> > itself and its syntax
> 
> What is the basis for this assertion?

About the formalism: it's only about functions and datatypes. You
don't need to understand any formal logic or anything else. RDB is
fully specified from the ground up as a standalone ADT to be consumed
by the Direct Mapping. RDF is reminded as an ADT as well, even if it
just follows the Recommendation.

About the syntax, two separate things:
* syntax for the _dependent types_: this may be the tricky part and I
once proposed Eric to erase the dependent part, keep the raw type and
put the extra information in the English text after each definition. We
agreed that we should wait for the WG to read and make a decision
instead.
* syntax for _functions_: any mathematician or programmer can read
functions. We've followed several advices to use the set-notation (ala
Python and other languages with list-comprehension) to generates, while
the first versions were using a monadic notation. As it's all about
generating value, I refused to use any iterator-based approach to define
this part.

Note that the syntax was already improved after Yvan's review.

> 
> > *All requirements* are in the document (at least for section 3.)
> 
> No, Alexandre, they are not.
> 
> What is a "common SQL datatype"?

I'm not sure where you found this text. Currently, the "common SQL
datatype"s are defined in [1]:
[[
Datatype    ::=   Int  |  Float  |  Date  |  …
]]

You started an interesting thread on the subject. I don't think that
being exhaustive is achievable for this question (because of all the
different implementations). I actually don't think that we want that
either, but I'll be happy to update the related stuff when the WG will
have decided what to do there.

My position is to keep the "…".

> 
> What is a "lexical value"?

Where do you find this text? I don't understand the exact context.

> 
> What is a "candidate key"?

Formally defined at [2]:
[[
CandidateKey    ::=   List(ColumnName)
]]

The corresponding English text is:
[[
A candidate key is made of a list of columns (their order matters).
]]

> 
> > and you clearly don't need a PhD to understand them.
> 
> Alexandre, I didn't ask about PhDs.
> 
> I asked about first-year students

It's for anybody who can read English. I believe it's enough to
understand the whole section, without reading the maths at all.

The "maths" just makes it's easier to proof-read and is for people who
understands what a function is. I believe this is the case for most
"first-year students".

> and about domain experts without maths background or DB background. They are part of the intended audience, according to Eric, or would you disagree with that? Is the style of Section 3 appropriate for such an audience?

Yes it is, and that has always been our goal. And I had several
occasions to check this was ok when I talked about this stuff.

> 
> > The credibility of this definition was ensured by the work of this Working Group as well as the researchers who proof-read our K-CAP paper, used to that king of stuff.
> 
> I don't know how the opinion of these researchers is relevant to this discussion, as they are not the target audience, according to Eric.

I'm not sure why Eric would say that. The knowledge community is
definitely on the expert side anyway.

> 
> > So I can safely say that we made sure that the technical barrier was
> > very low.
> 
> You made sure that the technical barrier is very low by having researchers proof-read a paper about the direct mapping?

My statement was a conclusion for my entire email, and was not really
referring to these "researchers proof-read a paper". We just "used" them
to check our mistakes, and they did an amazing job during the review :-)

What I wanted to say is that the Direct Mapping was really designed
and written with simplicity in mind. But it still has to be: robust,
correct, exhaustive, understandable, practical, usable.

> 
> I think you may be going about this the wrong way.
> 
> >> but the couple examples should allow them to predict the graph well enough to use it.
> >> 
> > 
> > I'd say that the English version coming along each definition/rule is
> > enough.
> 
> I find the English version *very* helpful for parts 3.1 to 3.3.
> 
> I find the English version useless for 3.4, except in the parts where it states things that are not stated in the maths. In 3.4, it is merely commentary on the maths, and *not* an alternative plain English expression of the maths. This should be fixed.

The math and the English text are made equivalent *on purpose*
everywhere. I'd rather like to keep it this way for consistency.

I'd prefer that we had a discussion about showing the English version by
default or not.

> 
> > For example, we know formally that the mapping goed through any RDB instance, thanks to the
> > static checking in the Scala implementation. This is a requirement for a so-called mapping, and we have it for free (at least for section 3.).
> 
> Static type checking didn't save you from writing [42] and [48] which are incapable of producing IRIs that are compatible with [33], meaning that your mapping cannot *ever* produce a correct RDF graph, except for an empty database. So I hesitate to put much trust in static type checking.

This is wrong.

Static type checking *with dependent types* can give you that. Scala's
type system is just not powerful enough. The thing is I personally don't
have time to write the implementation in Coq, especially if the spec is
likely to change, even slightly.

So the following is true: Eric and I didn't provide a proof that [42]
does not break the type-safety for [33]. afaik, this wasn't a
requirement and I can say that we added as much confidence in the
mapping as we could.

> 
> 
> Could you please add to the document:
> 
> 1. A normative instruction that states that clicking the "Show English Syntax" button is normative
> 
> 2. A normative instruction that states that clicking the "Hide Set-Style Syntax" button results in an incomplete rendering of the Direct Mapping
> 
> 3. A normative instruction that states that neither the "English Syntax" nor the "Set-Style Syntax" are complete on their own

Both version are intended to be normative, with the same level of
importance.

Eric and I still disagree on what to display by default and we hope that
the WG will take the action to decide what to do there.

> 
> 4. Something to explain that the squiggle is pronounced "Phi"

Agreed.

> 
> 5. A proper reference for "IWD 9075-14:2011(E)", Google can't find it

Eric told me this wasn't that easy. Eric, the ball is in your camp :-)

> 
> 6. An account of how row IRIs and row blank nodes are created (maybe I'm just stupid but I can't find it anywhere in Section 3)

This is not said on purpose, but I agree that we should say why.

Depending on who you are, you can do different things with this
function. For example, 1. an database-vendor has access to more
information (eg. the row-id) and can do a more effective mapping. 2.
Some SQL implementations provide functions to do the same. 3. Some SQL
implementations don't give access to this information.

It doesn't change the spirit of the Direct Mapping.

> 
> 7. An account of what the syntax "let abc = xyz in" means. I can't figure it out. Is this Scala syntax? If so, can you please add a normative reference to Scala and state in the Introduction that knowledge of Scala syntax is required?

I hope I have removed any mention of Scala in the document.

But I don't understand what you want me to do here. Do you think that
not everybody will be able to read this construct? What if I write "abc
= xyz" or "abc := xyz" or "abc ← xyz" instead? I believe we can take
"variable binding" for granted as this point, and "let ... = ... in" has
enough meaning in English and mathematics to go with it.

> 
> 8. Explain to me why there's a "table(r)" in "⟦table(r), c⟧col" and "⟦table(r), fk⟧col". Shouldn't the static type checking catch errors of this kind?

I guess you wanted to point at c and fk, as they appear to be of
different types.

This what made on purpose in [3]. "*" stands here for "one or more
elements of this type". This is safe because a ForeignKey is a List of
Columns and that a single Column is isomorphic to a List of Columns of
size 1.

But I agree that '*' this is not standard notation. So if you
demonstrate that this is really misleading, I agree to sacrifice
simplicity for 2 separate and redundant functions. Or can make the
function to take a list and inject the singleton c in a unary list.

> 
> 9. State that Datatype in [9] includes String (you explicitly check for String later)

This is not true in the current version [4]. I'll be happy to add String
after the WG will have made a general decision about the SQL Datatype
question.

> 
> 10. Do something about the fact that String in [9] and String in [10] are something different.

TableName and ColumnName are sub-types of String, but are not
compatible.

Be my guest if you want to propose a refinement of this notion of String
for this ADT, but I'm already happy enough with the current definition.

> 
> 11. Replace the reference to WSDL with a reference that explains how to percent-encode strings (it only explains, VERY badly, how to encode sets of name-value pairs)
> 
> 12. Use a reference for percent-encoding that is Unicode compatible (SQL column and table names are Unicode strings)

Eric?

> 
> 13. Clarify whether tables includes views or not

It's an implementation detail and we don't want to go further than the
ADT. For example, there is notion of parser there.

> 14. Clarify whether a Database is the set of tables in a Schema or the set of tables in a Catalog (or some other set of tables)

Not sure what you mean here.

> 
> 15. Make this a grammatical sentence: "A statement between a row and a foreign key, telling if the absence of NULL values"

Will try to come up with something better.

> 
> 16. Explain what the notation "(columnNames, _, _)" means

Hrmmmm. This function should be with the other accessor functions. And
there is no need to give an implementation as it can be just part of the
dependent type definition... I will fix it this way.

> 
> 17. Explain the significance of difference between the "foo(bar)" notation that is sometimes used, and the "[[bar]]foo" notation that is used at other times

To be honest, I'm not sure I'm able to explain this convention used in
all the semantics definition I've been given to read so far. It's just
like asking why people write "a + b" instead of "plus(a,b)". Do you have
something better to propose? Like "databaseSemantics(bar, foo) = ..."?
That would be very ugly :-) and so unnatural to so many people!

> 
> 18. Change the column IRIs so that it produces valid RDF IRIs, rather than relative IRIs

This would need us to change the definition of the Direct Mapping itself
so that it depends of a "stem URI" that is passed around everywhere, for
no added value.

I wonder if there is a discussion in the RDF community to add relative
URI. At least, we have a perfect use-case here with the Direct Mapping
to consider this option, and I suggest the RDB2RDF WG to speak with the
RDF WG on this subject.

> 
> 19. Fix the typo in "langageTag" and make it link somewhere proper

Sorry, it's gonna be fixed.

But note that it's not a typo if you're French ;-)

> 
> 20. Remove candidateKeys which is defined but never used

Right, it became obsolete the day we simplified the mapping by
introducing the "let ... = ... in" construct. Very good catch!

(note to myself: remove that in the Scala implementation)

> 
> I'll stop here and withdraw my earlier assertion that Section 3 may be ready for Last Call.

I was so desperate to have someone else to look at that stuff that I
used the "ready for Last Call" trick. Richard, I *highly* appreciate
your review and the time you spent on it. Be sure that I will fix that
stuff when I'll be back from vacation (unless Eric does it before me).

<rant>
Anyway, I'm really disappointed that it took so many months to have
another pair of eyes reading this document (more than a year actually,
if you consider Eric's first proposal). I'll be even more disappointed
if the WG decides to change fundamental things like the definition of
RDB (I'm still amazed how it was possible for a group called RDB2RDF to
not define RDB once and for all, before doing anything else), the
dependent types notation or even the syntax, as this has been in place
for a long time, with repeated calls to read this stuff in order to move
forward.
I just hope that Richard won't be the only one, and that others will get
their hands (and eyes) dirty as well.
</rant>

Alexandre. 

[1] http://www.w3.org/2001/sw/rdb2rdf/directMapping/?english#RDB-Datatype
[2] http://www.w3.org/2001/sw/rdb2rdf/directMapping/?english#RDB-CandidateKey
[3] http://www.w3.org/2001/sw/rdb2rdf/directMapping/?english#column-semantics
[4] http://www.w3.org/2001/sw/rdb2rdf/directMapping/?english#datatype-semantics
Received on Sunday, 10 July 2011 00:33:53 UTC