Re: Target audience of the Direct Mapping document?

Hi Juan,

On 11 Jul 2011, at 14:00, Juan Sequeda wrote:
> The target audience is for implementors: anybody who works on a relational database engine or wants to create a rdb2rdf engine that supports R2RML and wants to create a default R2RML mapping file with the direct mapping.

Ok. I'd suggest that the target audience also contains users who writes queries or mapping rules against the direct graph.

The document should state the target audience explicitly in the introduction.

> we could rewrite the english part and mix it with IF-THEN
> 
> IF 
>     given a table r with attributes a1, .. aM, and a1, .., an is a primary key of R where 1 ≤ n ≤ m
> THEN
>    Create a Triple of the form:
>     s = generate an IRI for the tuple by concatenating base_uri+"a1="+value of a1+", ... 
>     p = rdf:type
>     o = generate an IRI for the table by concatenating base_uri+"r"

My problem with this style is that it is hard for a reader to work out which rules to apply when. A computer can do this, but it's not an easy task for a human. In the denominational semantics, the reader can find their way through the definition just by following from function invocation to function definition.

> Do you think the existing english explanation is too formal?

I find the THEN part fairly understandable, although it would get more complex once you handle the required percent-encoding and other details.

My problem is with the IF part, because if I have ten or twenty such rules, then it is hard to figure out in which order they are applied, and how they build on each other in order to define the entire mapping.

I would much prefer a style that builds up the mapping from simple definitions, such as this definition for tuple IRI (or rather “row IRI” since we want to use SQL terms and not relational algebra terms):

[[
The row IRI of a row is a concatenation of the following:
* the table IRI of the row's table
* a slash character '/'
* for each column pkc of the table's primary key, in order:
  * the percent-encoded column name of pkc
  * an equals character '='
  * the field value of the column pkc in the row after SQL type conversion to string
  * a comma character ',' if it is not the last primary key column
]]

Terms like "table IRI", "percent-encoded", "SQL type conversion to string" would be hyperlinked to their respective definitions.

The definition for "row IRI" would then be used in the definition of "row type triple", which would roughly correspond to the rule you showed above.

The definition for "row type triple" would then be used in the definition of "row graph", which I sketched elsewhere in this thread in a response to Alexandre.

The definition of "row graph" would be used to define "table graph", which would be used to define "direct graph".

> Do you think if we add this type of IF-THEN would make this section better?

Not really, unfortunately, for the reasons above. I'd prefer if the different parts of the definitions were connected explicitly via definitions (my preferred style) or at least something like function invocations (Alexandre's style). I feel that in the rule style, the different parts of the mapping (the different rules) are not explicitly connected and do not refer to each other, making it hard to understand how to build up the entire structure from its parts.

Best,
Richard

Received on Monday, 11 July 2011 18:25:31 UTC