Re: SPARQL Protocol and Unicode characters

Eric Prud'hommeaux wrote:
> Clarification and notes -- this response was not considered by the DAWG:
> 
> On Thu, Feb 03, 2005 at 04:10:58PM +0100, Arjohn Kampman wrote:
> 
>>Dear all,
>>
>>The SPARQL Protocol as described at [1] suggests that SPARQL queries are 
>>going to be sent over the line as simple www-urlencoded strings. I would
>>like to point out that we have tried this approach in Sesame and that it
>>fails to handle multi-byte characters properly [2]. Main reason for this
>>is that the used %xx patterns cannot encode any byte values larger than
>>255.
>>
>>In Sesame, we "solved" this issue by switching to multipart/form-data
>>encoded POST requests.
> 
> 
> I presume you are using the charset parameter
> [[ [2388]
>    Each part of a multipart/form-data is supposed to have a content-
>    type.  In the case where a field element is text, the charset
>    parameter for the text indicates the character encoding used.
> ]]
> and that the clients tend to encoding the characters in charsets that
> the servers tend to understand.

Currently, the protocol fixes the character encoding to UTF-8. Use of
the charset parameter might actually be a good idea for this protocol.

> I phrase it this way because I'm looking at the trade-offs between:
>   - transaction-specified encoding.
>   - transaction-specified encoding with manditory support for at
>     least one common encoding.
>   - fixed-encoding (eg. utf-8), the only one used by the protocol.
> What encodings do you RDQL servers support?

Just UTF-8 at the moment, as explained above.

Regards,

Arjohn

-- 
arjohn.kampman@aduna.biz
Aduna BV - http://aduna.biz/
Prinses Julianaplein 14-b, 3817 CS Amersfoort, The Netherlands
tel. +31-(0)33-4659987  fax. +31-(0)33-4659987

Received on Tuesday, 15 March 2005 11:12:48 UTC