DM+R2RML implementation feedback: XSD mapping for binary columns

I propose that the binary datatypes like BLOB and VARBINARY should be mapped to xsd:hexBinary instead of xsd:base64Binary, because the SQL standard and most SQL implementations provide excellent support for hexadecimal strings, but no support whatsoever for base64-encoded strings, preventing efficient implementations where it would be easy with xsd:hexBinary.


1. XSD defines two datatypes for binary data: xsd:hexBinary and xsd:base64Binary. The SQL-to-XSD mapping in ISO/IEC 9075:14 says that either can be used to map binary SQL datatypes (BLOB, BINARY, VARBINARY).

2. The Natural Datatype Mapping used in the Direct Mapping and in R2RML maps  them to xsd:base64Binary. This was a more or less arbitrary decision, based on the fact that xsd:base64Binary yields a more compact representation and is (according to Google hit counts) more popular. But I now believe that this decision was wrong and that the Natural Mapping  should map these types to xsd:hexBinary.

3. The SQL standard way of representing binary literals is based on the same hexadecimal string representation as xsd:hexBinary. For example, this is a four-byte binary literal in standard SQL: X'00FF00FF'; and the equivalent xsd:hexBinary value: "00FF00FF"^^xsd:hexBinary. This syntax or some variation is supported by *all* database implementations. Most also provide built-in SQL functions for converting between character strings and hex strings. I have not found *any* support for decoding or encoding base64 strings in any database implementation. Hence, implementing the Direct Mapping and R2RML would be considerably easier if xsd:hexBinary is required.

4. This issue becomes worse when evaluating SPARQL queries that involve FILTERs and other functions over binary data. If binary data is handled as xsd:hexBinary, then these functions can be translated to SQL and executed in the database. If xsd:base64Binary is used, then this will be impossible (unless stored procedures are used, which is impossible in some deployment scenarios). Any operation that cannot be pushed down to the DB will suffer from disastrous performance in some cases.

5. The downsides of this change would be: The xsd:hexBinary representation is more verbose. Some clever tricks, like constructing data: URIs (which rely on base64 encoding) using URI templates won't work anymore.


Received on Sunday, 22 April 2012 22:40:22 UTC