Re: Comments from Julien Homo - TCs

Hi Boris,

(adding Julien to cc)

On 11 Apr 2012, at 15:29, Boris Villazon-Terrazas wrote:
> J: My R2RMLTC0009c and R2RMLTC0009d tests fail because mapping was successfull with unamed column on the two dbms : why any columns in the SELECT list derived by projecting an expression like an expression with keyword COUNT must be named ? Is it to conform to Core SQL 2008 ?
> B: Because we need to refer to those columns, otherwise there won't be a way to reference those, right?  it's no related to the Core SQL 2008

It has nothing to do with Core SQL 2008. If a SELECT expression is unnamed, then we can't refer to it from a rr:column or rr:template expression, because, well, we don't know its name. So adding unnamed columns in the SELECT clause is useless.

But yeah, I think it was a mistake to write in the spec that the SQL query MUST NOT have unnamed columns. That's because we cannot actually validate this without parsing the SQL query. A better choice would have been to write that the SQL query MAY have unnamed columns, but such columns can't be referenced in rr:column or rr:template.

Side note: The SQL spec says that such columns get an implementation-dependent name. So the DB engine picks a name. In theory we could specify that it's ok to refer to the column by that implementation-dependent name, but such mappings couldn't be validated and wouldn't be portable. So I think this would be a bad idea.

> Issue/question B
> J: R2RMLTC00016e and DirectGraphTC0016 tests fail :  I have to modify SQL input file in order to postgreSQL does not raise a syntax error. Indeed, on the one hand BINARY VARYING is an unknown datatype for postgreSQL (bytea data type allows storage of binary strings, 
> B: We have to check how to deal with this. AFAIR Nuno has the same problem, does anyone of the group is working with this matter?

R2RML is a mapping language for SQL 2008. DBs that don't support the standard SQL 2008 datatypes can't pass the tests.

Section 10 already discusses how you're supposed to deal with vendor-specific types.

People who are interested in supporting R2RML over non-SQL-compatible databases like PostgreSQL essentially need to define a slightly modified flavour of R2MRL that can handle the vendor-specific aspects. To validate an implementation of such a modified R2RML flavour, you obviously need a slightly modified test suite.

It's not the responsibility of the WG to produce such R2RML flavours or modified test suites.

On the other hand, several WG members will be interested in producing some of those, and of course it would be good if they collaborate and share results, and the WG mailing lists are IMO a good place to discuss such work.

> J: On the other hand, encoding characters like "\ux2F" instead of "/" are not recognized. Is necessarilystrictly adhere to the syntax of the SQL query to validate this test or these requests can be adapted ?
> B: I had to use "\ux2F" because hsqldb (I'm using hsqldb for parsing the SQL scripts) does not support "/" within BINARY VARYING. Does anyone of the group have the same problem with other DBMS?

Well, the INSERT statements in D016-1table1primarykey10columns3rowsSQLdatatypes don't work like this. CAST doesn't know anything about base64-encoding! Don't use a CAST but use a binary literal.

The standard SQL way of encoding binary literals is this:


The X marks the following string as a binary literal (that is, a hexadecimal string that can be stored in a BINARY/VARBINARY/BLOB column). Note that the 0123456789ABCDEF part is a simple hexadecimal string and has nothing to do with base64 encoding. Base64 encoding is only introduced at the R2RML output layer; SQL doesn't know *anything* about it.

HSQLDB, MySQL, DB2 and Firebird all support this standard SQL syntax.

PostgreSQL doesn't, it requires some weird fucked-up syntax, see here:


SQL Server requires a C style literal:


Oracle expects the standard literal syntax and does the Right Thing based on the column datatype:


(Also note: BINARY VARYING and VARBINARY are synonyms in SQL 2008, and the latter is slightly more portable, so it's probably better to use VARBINARY(200) in the table creation statement.)

> Issue/question C
> J: R2RMLTC0016b and DirectGraphTC0016 : You use canonical RDF lexical form for double datatypes like"80.25E0" but it's "8.025E1" that appears in DirectGraphTC0016. R2RML CR indicates the choice of lexical form is implementation-dependent but my test fails because these results are not homogeneous. Can you confirm this expected result ?
> B: It was my mistake, now both (DirectGraph and R2RML) are the same "8.025E1" . is that ok?

IMO the tests should always use the canonical form, even in cases where it's not strictly required. 8.025E1 is canonical. Ideally, the test suite would recognize that "80.25E0"^^xsd:double and "8.025E1"^^xsd:double are equal, and would pass the test (maybe with a warning/notice).

> Issue/question D
> J: R2RMLTC00016e : IRI built from binary data seem to be not base64 encoded ("<>") contrary to the DirectGraphTC0016 test.
> B: Actually the IRIs of the R2RML are the good ones, because the idea of create them was to have data URIs  thoughts? I need to fix the DirectGraph ones.

Here's the binary literal that you need to put into the table to get the data URI above:


(I just googled for "base64 to hex" to find an online tool for converting between hex and base64.)


> Thanks in advance and regards
> Boris
> P.S. I can include the issues/questions on the wiki page for gathering the implementations experiences, what is the URL?
> [1]
> [2]

Received on Wednesday, 11 April 2012 17:03:46 UTC