W3C home > Mailing lists > Public > ietf-http-wg@w3.org > July to September 2008

RE: Factoring out Content-Disposition (i123), was: Content-Disposition (new issue?)

From: Brian Smith <brian@briansmith.org>
Date: Fri, 15 Aug 2008 14:54:07 -0500
To: "'Julian Reschke'" <julian.reschke@gmx.de>, <ietf-http-wg@w3.org>
Message-ID: <04321DCFF4674D59B25064AC0E9721B6@T60>

Julian Reschke wrote:
> By default, message header parameters in Hypertext 
> Transfer Protocol  (HTTP) messages can not carry characters 
> outside the ISO-8859-1 character set. RFC 2231 defines an 
> escaping mechanism for use in Multipurpose Internet Mail 
> Extensions (MIME) headers.  This document specifies a 
> profile of that encoding suitable for use in HTTP.

During the IETF meeting, what was the result of the discussions about
Unicode support in HTTP? Looking at the IRC log, it looked like the
discussion was leaning towards allowing UTF-8 in an otherwise-unencoded form
in headers (applications should start accepting unencoded UTF-8 but should
avoid sending it right now). If that is the way things are going to go, a
general RFC 2231 profile for HTTP seems counterproductive.

RFC 2231 + UTF-8 is an especially bad interchange format for text since it
requires over 9 bytes per letter for the vast majority of people's native
languages. Plus, there are no features for language tagging (needed for CJK
languages), BIDI (needed for middle-eastern languages), or accessibility
(for users of screen readers). IMO, the best thing to do is to keep
language-sensitive text out of HTTP as much as possible by recommending that
applications transfer language-sensitive text in entity bodies as much as
possible. Really, it is only suitable for short, language-neutral  strings
like (file and IRI) path fragments.

Nitpicks:

The draft references Unicode 4.0 indirectly through RFC3629. It would be
better to allow implementations to use any later versions, or at least the
current version, 5.1.

I don't see the point of requiring ISO-8859-1. ISO-8859-1 can only encode a
very small number of languages that are used by a small minority of people
(who just happen to be over-represented in standards committees). Advocating
ISO-8859-1 also seems to be the opposite of what was discussed at the IETF
meeting (AFAICT from the logs).

Regards,
Brian
Received on Friday, 15 August 2008 19:54:46 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Friday, 27 April 2012 06:50:54 GMT