- From: Wes Garland <wes@page.ca>
- Date: Mon, 20 Feb 2012 07:45:38 -0500
- To: Brendan Eich <brendan@mozilla.com>
- Cc: es-discuss <es-discuss@mozilla.org>, "public-script-coord@w3.org" <public-script-coord@w3.org>, mranney@voxer.com
- Message-ID: <CAHB0tE7_kdXSidT+fWEP8gUeb=i-putFogKuyfMJyUAPiSwksA@mail.gmail.com>
On 19 February 2012 16:34, Brendan Eich <brendan@mozilla.com> wrote: > Wes Garland wrote: > >> Is there a proposal for interaction with JSON? >> > > From http://www.ietf.org/rfc/rfc4627, 2.5 > *snip* - so the proposal is to keep encoding JSON in UTF-16. What happens if the BRS is set to Unicode and we want to encode the string "\uD834\uDD1E" -- the Unicode string which contains two reserved code points? We do not want to deserialize this as U+1D11E. I think we should consider that BRS-on should mean six-character escapes in JSON for non-BMP characters. It might even be possible to add matching support for JSON.parse() when BRS-off. The one caveat is that might make JSON interchange fragile between BRS-on systems and ES5 engines. Yes, sharing the uint16 vector is good. But string methods would have to > index and .length differently (if I can verb .length ;-). > .lengthing is easy; cost is about the same as strlen() and can be cached. Indexed access is something I have thought about from the implementor's POV for a while [but not heavily]. I haven't come up with a ground-breaking technique, I keep coming up with something that looks like a lookup table for surrogate pairs, degrading to an extra uint32[] when there are many of them. Anyhow, implementation detail. > Of course, strings with the same characters are == and ===. Strings appear > to be values. If you think of them as immutable reference types there's > still an obligation to compare characters for strings because computed > strings are not intern'ed. > What about strings with the same sequence of code units but different code points? They would have identical backing stores if the backing store were either UTF-8 or uint32. This can happen if we have BRS-on Strings which contain non-BMP code points. (Actually, does BRS-on mean that we have to abandon UTF-16 to store Unicode strings containing invalid code points? Mark Davis, are you reading?) How about strings which are considered equal by Unicode but which do not share the same representation? Will Unicode normalization be performed when Strings are created/parsed? On comparison? If on compare, would we skip normalization for ===? I assume normalizing to NFC form, similar to what W3C does, is the target? http://www.macchiato.com/unicode/nfc-faq (Mark Davis) http://unicode.org/faq/normalization.html Wes -- Wesley W. Garland Director, Product Development PageMail, Inc. +1 613 542 2787 x 102
Received on Monday, 20 February 2012 12:46:10 UTC