Re: I18N-ISSUE-191: Various nits in Appendix B [TURTLE] from Martin J. Dürst on 2012-09-08 (public-rdf-comments@w3.org from September 2012)

From: Martin J. Dürst <duerst@it.aoyama.ac.jp>
Date: Sat, 08 Sep 2012 13:39:11 +0900
To: Internationalization Core Working Group <www-international@w3.org>, public-rdf-comments@w3.org
CC: Internationalization Core Working Group Issue Tracker <sysbot+tracker@w3.org>
Message-ID: <504ACBEF.7040505@it.aoyama.ac.jp>

On 2012/09/08 1:01, Internationalization Core Working Group Issue 
Tracker wrote:
> I18N-ISSUE-191: Various nits in Appendix B [TURTLE]
>
> http://www.w3.org/International/track/issues/191
>
> Raised by: Addison Phillips
> On product: TURTLE
>
> Appendix B contains this note:
>
> Encoding considerations:
>      The syntax of Turtle is expressed over code points in Unicode [UNICODE]. The encoding is always UTF-8 [UTF-8].
>      Unicode code points may also be expressed using an \uXXXX (U+0 to U+FFFF) or \UXXXXXXXX syntax (for U+10000 onwards) where X is a hexadecimal digit [0-9A-Fa-f]
>
> As mentioned in other comments:
>
> - The encoding refers to the serialization of a TURTLE document, not necessarily its in-memory representation (which should just be a sequence of Unicode code points)

I already mentioned this in another issue, but my guess is that what's 
standardized is the "on the wire" (or "on disk") format, because that's 
the only one relevant. In memory is pretty much each implementation's 
private business; an implementation could use floating point values with 
the reciprocal or the square of the Unicode codepoint value, and as far 
as the spec is concerned, we wouldn't care.

That's different if and when there is an API standard, but I don't think 
that's what is being worked on.

Regards,   Martin.

> - The reference to U+0 should read U+0000
> - We recommend a different escape syntax altogether
> - We recommend six-digit rather than eight-digit \U representation
>
>
>
>

Received on Saturday, 8 September 2012 04:39:48 UTC