- From: Jonas Sicking <jonas@sicking.cc>
- Date: Tue, 7 Aug 2012 10:51:34 -0700
- To: Joshua Bell <jsbell@chromium.org>
- Cc: WHAT Working Group <whatwg@whatwg.org>, Glenn Maynard <glenn@zewt.org>
On Tue, Aug 7, 2012 at 9:48 AM, Joshua Bell <jsbell@chromium.org> wrote: >> Not an objection, but where does XHR limit sent data to those encodings? >> send(FormData) forces UTF-8 (which is even more restrictive); >> send(Document) seems to allow any encoding *except* for UTF-16 (presumably >> web compat since that's a weird criteria). >> >> I'm not sure that staying in sync with XHR--which has its own pile of >> legacy code to support--is worthwhile here anyway, but limiting to Unicode >> seems fine in its own right, especially since the restriction can always >> be >> lifted later if real needs come up. >> >> However I currently can't find any restrictions on which target >> > encodings are supported in the current drafts. > > > When Anne's spec appeared I gutted mine and deferred wherever possible to > his. One consequence of that was getting the other encodings "for free" as > far as the spec writing goes. > > If we achieve consensus that we only want to support UTF encodings we can > add the restrictions. There are use cases for supporting other encodings > (parsing legacy data file formats, for example), but that could be deferred. I don't mind supporting *decoding* from basically any encoding that Anne's spec enumerates. I don't see a downside with that since I suspect most implementations will just call into a generic decoding backend anyway, and so supporting the same set of encodings as for other parts of the platform should be relatively easy. That also means that we don't have to figure out which encodings we need to support to support reading legacy file formats etc. However I think we should consider restricting support to a smaller set of encodings for while *encoding*. There should be little reason for people today to produce text in non-utf formats. We might even be able to get away with only supporting UTF8, though I wouldn't be surprised if there are reasonably modern file formats which use utf16. Restricting the encoding formats have the advantage of that we can rely on the target encoding to support a consistent feature set. For example we don't need to deal with defining what to do if we receive a perfectly well formed string, but the target encoding doesn't support all the characters in that string. Likewise we don't have to deal with target encodings which doesn't support the replacement character. / Jonas
Received on Tuesday, 7 August 2012 17:52:35 UTC