- From: John Tamplin <jat@google.com>
- Date: Tue, 13 Mar 2012 22:47:44 -0400
On Tue, Mar 13, 2012 at 8:19 PM, Glenn Maynard <glenn at zewt.org> wrote: > Using Views instead of specifying the offset and length sounds good. > > On Tue, Mar 13, 2012 at 6:28 PM, Ian Hickson <ian at hixie.ch> wrote: > > > - What's the use case for supporting anything but UTF-8? > > > > Other Unicode encodings may be useful, to decode existing file formats > containing (most likely at a minimum) UTF-16. I don't feel strongly about > that, though; we're stuck with UTF-16 as an internal representation in the > platform, but that doesn't necessarily mean we need to support it as a > transfer encoding. > > For non-Unicode legacy encodings, I think that even if use cases exist, > they should be given more than the usual amount of scrutiny before being > supported. > The whole idea is to be able to extract textual data out of some packed binary format. If you don't support the character sets people want to use, they will simply do like they have to do now and hand-code the character set conversion, where it will slow and inaccurate. In particular, I think you have to include various ISO-8859-* character sets (especially Latin1) and the non-Unicode character sets still frequently used by Japanese and Chinese users. I am fine with strongly suggesting that only UTF8 be used for new things, but leaving out legacy support will severely limit the utility of this library. -- John A. Tamplin Software Engineer (GWT), Google
Received on Tuesday, 13 March 2012 19:47:44 UTC