- From: Cameron McCormack <cam@mcc.id.au>
- Date: Wed, 1 Jul 2009 13:02:15 +1000
- To: "L. David Baron" <dbaron@dbaron.org>
- Cc: public-webapps@w3.org, jwalden@mit.edu, jonas@sicking.cc, annevk@opera.com
Hi David. L. David Baron: > This algorithm seems incorrect in two ways: > > * It gets the ranges for high and low surrogates backwards. (High > surrogates are U+D800 - U+DBFF, low surrogates are U+DC00 - > U+DFFF, and in UTF-16 a surrogate pair is a high surrogate > followed by a low surrogate. So swapping the ranges in the > headings should make the algorithm correct, modulo the next > point. But you should definitely double-check this. :-) Ouch, you’re right. > * It incorrectly handles unpaired high surrogates by eating the > next character. Instead, I would prefer that the unpaired high > surrogate is replaced by U+FFFD, and the following character is > interpreted normally. (That's what Gecko does, anyway. > Furthermore, I think it makes sense to match the handling of > unpaired low surrogates.) I meant to do that initially, dunno what went wrong. Should be fixed now. http://dev.w3.org/2006/webapi/WebIDL/#dfn-obtain-unicode Thanks, Cameron -- Cameron McCormack ≝ http://mcc.id.au/
Received on Wednesday, 1 July 2009 03:02:59 UTC