Re: [WebIDL] Bugs in DOMString conversion to Unichode characters (was Re: "send data using the Web Socket" and UCS-2) from Cameron McCormack on 2009-07-01 (public-webapps@w3.org from July to September 2009)

From: Cameron McCormack <cam@mcc.id.au>
Date: Wed, 1 Jul 2009 13:02:15 +1000
To: "L. David Baron" <dbaron@dbaron.org>
Cc: public-webapps@w3.org, jwalden@mit.edu, jonas@sicking.cc, annevk@opera.com
Message-ID: <20090701030215.GD24402@arc.mcc.id.au>

Hi David.

L. David Baron:
> This algorithm seems incorrect in two ways:
> 
>  * It gets the ranges for high and low surrogates backwards.  (High
>    surrogates are U+D800 - U+DBFF, low surrogates are U+DC00 -
>    U+DFFF, and in UTF-16 a surrogate pair is a high surrogate
>    followed by a low surrogate.  So swapping the ranges in the
>    headings should make the algorithm correct, modulo the next
>    point.  But you should definitely double-check this. :-)

Ouch, you’re right.

>  * It incorrectly handles unpaired high surrogates by eating the
>    next character.  Instead, I would prefer that the unpaired high
>    surrogate is replaced by U+FFFD, and the following character is
>    interpreted normally.  (That's what Gecko does, anyway.
>    Furthermore, I think it makes sense to match the handling of
>    unpaired low surrogates.)

I meant to do that initially, dunno what went wrong.  Should be fixed
now.

  http://dev.w3.org/2006/webapi/WebIDL/#dfn-obtain-unicode

Thanks,

Cameron

-- 
Cameron McCormack ≝ http://mcc.id.au/

Received on Wednesday, 1 July 2009 03:02:59 UTC