- From: Alex Milowski <alex@milowski.com>
- Date: Tue, 21 May 2013 07:37:36 -0700
- To: "Eric Prud'hommeaux" <eric@w3.org>
- Cc: "public-rdf-comments@w3.org" <public-rdf-comments@w3.org>
Received on Tuesday, 21 May 2013 14:38:03 UTC
On Tue, May 21, 2013 at 4:06 AM, Eric Prud'hommeaux <eric@w3.org> wrote: > > > Alex, I see that you pass the surrogate tests, e.g. *_with_UTF8_boundaries > [SURT]. Do you use UTF-16 internally, i.e. parse \U00010000 as 0xD800 > 0xDC00 ? > > > In all cases, I generate surrogate pairs for U+10000 and above and this allows the tests and their comparisons to pass. For the browser's javascript environment, this is exactly how Javascript will see data that is loaded via the browser. Unfortunately, users will have a hard time constructing these strings, as I did, because you can't directly represent these characters in literals in Javascript. While that is unfortunate, it is how Javascript current works as the \u escape only supports the BMP. It is possible to handle U+10000 and above in Javascript but it requires understanding what surrogate pairs are and how to encode and decode them. -- --Alex Milowski "The excellence of grammar as a guide is proportional to the paucity of the inflexions, i.e. to the degree of analysis effected by the language considered." Bertrand Russell in a footnote of Principles of Mathematics
Received on Tuesday, 21 May 2013 14:38:03 UTC