- From: Wes Garland <wes@page.ca>
- Date: Wed, 22 Feb 2012 09:09:55 -0500
- To: Norbert Lindenberg <ecmascript@norbertlindenberg.com>
- Cc: Brendan Eich <brendan@mozilla.com>, "public-script-coord@w3.org" <public-script-coord@w3.org>, mranney@voxer.com, es-discuss <es-discuss@mozilla.org>
- Message-ID: <CAHB0tE6FwucSUR_62NcThTm40Lfhf+8W=N=X6=vHVkRq0dsJRw@mail.gmail.com>
Interesting scenarios, Norbert -- well-thought-through. The final goal (for me, at least) is to be able to tell my developers to "Just write code" and forget about the details about how the characters in strings are encoded. Your point about the bidi library is an important one, but I think if we could somehow survey the web that we would find that the vast majority of applications do The Wrong Thing now and that flipping the BRS would magically fix a lot of them. I think any group that is "with it" w.r.t. Unicode in JS today will find a way to embrace BRS-on as long there is a reasonable path to follow. Some day, I hope developers will simply start all documents with something like <!DOCTYPE HTML UNICODE> and never worry about character encoding details again. That is when we will start to see benefits, and these benefits will snowball as organizations start to do this. Of course, to get there, we have to somehow manage the transition. I think your point about the static rejection of four-byte Unicode escapes is really important. During the transitional period, we need a way to write JS libraries than can run with BRS on or off. If four-byte escapes are statically rejected in BRS-on, we have a problem -- we should be able to use old code that runs in either mode unchanged when said code only uses characters in the BMP. Accepting both 4 and 6 byte escapes is a problem, though -- what is "\u123456".length? 1 or 3? If we accept "\u1234" in BRS-on as a string with length 5 -- as we do today in ES5 with "\u123".length===4 -- we give developers a way to feature-test and conditionally execute code, allowing libraries to run with BRS-on and BRS-off. It's awkward, though: there is no way to recover static strings programmatically since the \ has been eaten by the JS compiler. And users *will* want to programmatically convert arrays of strings (think gettext) So, it seems that for a good migration path we somehow need to mark string literals so that the parser knows how to deal with them. And we need to do it in a way that "just works" in ES5 while preserving natural syntax with BRS-on. *Idea*: can we add a per-script attribute which allows a transitional parsing scheme for string literals when BRS-on? This transitional scheme would parse string literals like BRS-off, *unless* the string literal had a leading U. Having a per-script attribute lets module system developers deal with the problem easily when using DOM SCRIPT tag injection to load modules. It also allows users switching BRS-on to load old content from foreign sites, which I believe is necessary for widespread BRS-on adoption. Sample program demonstrating how this might work: <!DOCTYPE HTML UNICODE> <html> <script> var i; var a = [0]; a.push("\u1234"); </script> <script parser="unicodeTransitional"> a.push("\u1234"); a.push(U"\u1234"); a.push(U"\u123456"); <script> a.push("\u123456"); for (i=0; i < a.length; i++) { console.log(i + " -> " + a[i].length); } </script> </html> Output: 0 -> 5 1 -> 1 2 -> 5 3 -> 1 4 -> 1 I think this is a sustainable solution that gives developers just enough tools to retrofit without going off in lala-land by adding a bunch of extra types and helper methods. Wes -- Wesley W. Garland Director, Product Development PageMail, Inc. +1 613 542 2787 x 102
Received on Wednesday, 22 February 2012 14:10:27 UTC