- From: Jonathan Ballard <dzonatas@gmail.com>
- Date: Fri, 3 Aug 2012 08:30:37 -0700
- To: Martin J. Dürst <duerst@it.aoyama.ac.jp>
- Cc: Poul-Henning Kamp <phk@phk.freebsd.dk>, Mike Belshe <mike@belshe.com>, James M Snell <jasnell@gmail.com>, "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
- Message-ID: <CAAPAK-5uO7imx-ynVU1K-RPpv0CvCDU-kd18hFgRYgXJn-WrMA@mail.gmail.com>
ASCII is not "trivially UTF8." UTF8 lacks the available flow control of ASCII. Any conversion between ASCII and EBCDIC is best done in hardware. We already know the security issue of conversions from unicode to EBCDIC, and I doubt that is something we can scheme here on on-topic. On Friday, August 3, 2012, "Martin J. Dürst" wrote: > On 2012/08/02 17:27, Poul-Henning Kamp wrote: > >> In message< >> CABaLYCv7U7iLBu5+8Nb9Wa1VeQguoMLJw4VOCbDBQK3WoE-sFg@mail.gmail.com> >> , Mike Belshe writes: >> >> * I don't think we need utf-8 encoded headers. Not sure how you'd pass >>>>> >>>> them off to HTTP anyway? >>>> >>> >>> I just don't see any problem being solved by adding this? If there is no >>> benefit, we should not do it, right? >>> >> >> If this would solve any major problems inside a 20 year horizon, we >> should do it. >> > > It will solve quite a few problems, some of them major, maybe not for HTTP > itself, but for the applications on top. It will actually solve some > problems that have been around for at least the last 15 years. > > HTML and HTTP were created when the breakthrough of iso-8859-1 (Latin-1) > in Western Europe was predictable (the nascent Web helped to unify the > Western Europe 'national' 7-bit and 8-bit encodings quite a bit). > > At least as early as 1995 (RFC 2070) or 1996 (RFC 2130, RFC 2277), it was > clear to those concerned that Unicode and UTF-8 was the way of the future. > As everybody should be able to confirm when thinking about US-ASCII, using > a single character encoding (rather than e.g. ASCII and EBCDIC or some such > alternatively) brings HUGE benefits. The same is true when streamlining > from a zoo of character encodings to UTF-8. > > These days, over 60% of the Web is already in UTF-8, and if you add in the > 20% of pure ASCII which is trivially also UTF-8, it's 80%. All other > encodings are in serious decline. (see p. 52 of the July IEEE Spectrum). > And efforts such as HTML5 are strongly pushing to get more UTF-8. I think > lots of HTTP users would appreciate a better commitment from HTTP with > respect to character encoding in headers and the like. What's currently > there is really just a mess, and should be cleaned up. > > > Regards, Martin. > >
Received on Friday, 3 August 2012 15:31:08 UTC