Re: EARL Report for Green Turtle from Alex Milowski on 2013-05-21 (public-rdf-comments@w3.org from May 2013)

From: Alex Milowski <alex@milowski.com>
Date: Tue, 21 May 2013 07:37:36 -0700
To: "Eric Prud'hommeaux" <eric@w3.org>
Cc: "public-rdf-comments@w3.org" <public-rdf-comments@w3.org>
Message-ID: <CABp3FNLwaxmmLkg5WwCcAP+TtPx=JA3NX7t6KfjARMC7cfU+hw@mail.gmail.com>

On Tue, May 21, 2013 at 4:06 AM, Eric Prud'hommeaux <eric@w3.org> wrote:

>
>
> Alex, I see that you pass the surrogate tests, e.g. *_with_UTF8_boundaries
> [SURT]. Do you use UTF-16 internally, i.e. parse \U00010000 as 0xD800
> 0xDC00 ?
>
>
>
In all cases, I generate surrogate pairs for U+10000 and above and this
allows the tests and their comparisons to pass.  For the browser's
javascript environment, this is exactly how Javascript will see data that
is loaded via the browser.

Unfortunately, users will have a hard time constructing these strings, as I
did, because you can't directly represent these characters in literals in
Javascript.  While that is unfortunate, it is how Javascript current works
as the \u escape only supports the BMP.

It is possible to handle U+10000 and above in Javascript but it requires
understanding what surrogate pairs are and how to encode and decode them.

-- 
--Alex Milowski
"The excellence of grammar as a guide is proportional to the paucity of the
inflexions, i.e. to the degree of analysis effected by the language
considered."

Bertrand Russell in a footnote of Principles of Mathematics

Received on Tuesday, 21 May 2013 14:38:03 UTC