Re: PLUS SIGN character in value of a pkey column

On 1 Nov 2011, at 18:33, Souripriya Das wrote:
> In Section 3 [1] of Direct Mapping LC Working draft (reproduced below), do we need to replace PLUS SIGN character in the value of a key column with its percent encoding?
> 
> ---------------
> Definition percent-encode: (a subset of HTML5 form dataset encoding):
> 
>    * Replace each PERCENT SIGN character ('%', U+0025) with the string "%25".
>    * For table names, replace each NUMBER SIGN character ('#', U+0023) with the string "%23".
>    * For table names, replace each SOLIDUS character ('/', U+002f) with the string "%2f".
>    * For attribute names, replace each HYPHEN-MINUS character ('-', U+003d) with the string "%3D".
>    * For attribute values, replace each FULL STOP character ('.', U+002e) with the string "%2E".
>    * Replace each SPACE character (U+0020) with the PLUS SIGN character (+, U+002B).
> -----------------
> 
> [1] http://www.w3.org/TR/rdb-direct-mapping/#definition

Hm, this definition has at least two problems:

1. It is lossy, because – as you correctly note – the strings “ ” and “+” end up with the same encoded representation, making it impossible to reconstruct the original string.

2. It doesn't escape many characters that are forbidden in IRIs, making the results potentially violate the IRI (and RDF) specs.

To me it's also not clear why HTML form-encoding is used here instead of %-encoding as defined in RFC 3986. In other words, why are space characters encoded as “+” and not as “%20”? Form-encoding would clearly be appropriate if the URIs had the form ...?foo=this&bar=that, but since this is not the case, normal %-encoding seems to make more sense.

Best,
Richard

Received on Wednesday, 2 November 2011 14:55:01 UTC