Re: I18N-ISSUE-186: Encoding of document vs. form of document? from Martin J. Dürst on 2012-09-08 (public-i18n-core@w3.org from July to September 2012)

From: Martin J. Dürst <duerst@it.aoyama.ac.jp>
Date: Sat, 08 Sep 2012 13:27:56 +0900
To: Internationalization Core Working Group <public-i18n-core@w3.org>
CC: Internationalization Core Working Group Issue Tracker <sysbot+tracker@w3.org>
Message-ID: <504AC94C.2080702@it.aoyama.ac.jp>

On 2012/09/08 0:41, Internationalization Core Working Group Issue 
Tracker wrote:
> I18N-ISSUE-186: Encoding of document vs. form of document?
>
> http://www.w3.org/International/track/issues/186
>
> Raised by: Addison Phillips
> On product:
>
> Section 6. Refers to TURTLE documents as being encoded as UTF-8. In practice, UTF-8 is a serialization. The actually document should just be "a sequence of Unicode characters". This allows TURTLE processors to use whatever native Unicode processing scheme is most suitable. Cf. XML.

I slightly disagree here. Making documents "a sequence of Unicode 
characters" is important e.g. for XML and HTML, where many different 
character encodings are possible and used in practice. For TURTLE, UTF-8 
is *the only* character encoding.

In case spec mandates that UTF-8 has to be used even internally when 
processing TURTLE, then that would need to be changed, but the way it's 
proposed here is going too far.

Regards,    Martin.

Received on Saturday, 8 September 2012 04:28:33 UTC