Re: Proposal to NOT address I18N-ISSUE-186: Encoding of document vs. form of document? from Andy Seaborne on 2013-01-27 (public-rdf-wg@w3.org from January 2013)

From: Andy Seaborne <andy.seaborne@epimorphics.com>
Date: Sun, 27 Jan 2013 13:30:40 +0000
To: public-rdf-wg@w3.org
Message-ID: <51052C00.7090405@epimorphics.com>

On 25/01/13 21:20, Eric Prud'hommeaux wrote:
> Proposal to address I18N-ISSUE-186: Encoding of document vs. form of document?
> ===============================================================
>
> * Martin J. Dürst <duerst@it.aoyama.ac.jp> [2012-09-08 13:27:56+0900]
>> On 2012/09/08 0:41, Internationalization Core Working Group Issue
>> Tracker wrote:
>>> I18N-ISSUE-186: Encoding of document vs. form of document?
>>>
>>> http://www.w3.org/International/track/issues/186
>>>
>>> Raised by: Addison Phillips
>>> On product:
>>>
>>> Section 6. Refers to TURTLE documents as being encoded as UTF-8. In practice, UTF-8 is a serialization. The actually document should just be "a sequence of Unicode characters". This allows TURTLE processors to use whatever native Unicode processing scheme is most suitable. Cf. XML.
>>
>> I slightly disagree here. Making documents "a sequence of Unicode
>> characters" is important e.g. for XML and HTML, where many different
>> character encodings are possible and used in practice. For TURTLE, UTF-8
>> is *the only* character encoding.
>
> I note that this comment never went to the www-rdf-comments list. Perhaps Martin's comment put this to rest? (I hope so, 'cause we finished the other i18n comments an age ago and we were about to go to LC when someone noticed the gap between 185 and 187.) At any rate, our decisions for Turtle, N3 and SPARQL were to allow exactly one encoding. I believe the HTML5 WG made a similar decision for Polyglot docs.
>
>
>> In case spec mandates that UTF-8 has to be used even internally when
>> processing TURTLE, then that would need to be changed, but the way it's
>> proposed here is going too far.
>
> I have seen both strategies work for Turtle and SPARQL implementations and am confident that the spec in no way favors an internal representation.

The Jena implementations (turtle, SPARQL) do not use UTF-8 internally.

 Andy

>
>
>> Regards,    Martin.

Received on Sunday, 27 January 2013 13:31:14 UTC