Re: PROPOSAL: i74: Encoding for non-ASCII headers from Henrik Nordstrom on 2008-03-31 (ietf-http-wg@w3.org from January to March 2008)

From: Henrik Nordstrom <henrik@henriknordstrom.net>
Date: Mon, 31 Mar 2008 19:30:17 +0200
To: Frank Ellermann <hmdmhdfmhdjmzdtjmzdtzktdkztdjz@gmail.com>
Cc: ietf-http-wg@w3.org
Message-Id: <1206984617.4921.30.camel@HenrikLaptop>

fre 2008-03-28 klockan 16:33 +0100 skrev Frank Ellermann:
>   
> If we'd work on HTTP/1.2 proposing to ignore RFC 5198 would
> be madness, but we are supposed to improve the HTTP/1.1 spec.

Actually there is no real difference here if we worked on HTTP/1.2. In
HTTP headers is defined to have the same meaning for as long as the
major version number is the same. If changing TEXT from ISO-8859-1 to
UTF-8 is a problem for HTTP/1.1, it's likewise a problem for HTTP/1.2 as
HTTP/1.2 still needs to deal with how the message is understood if
downgraded to earlier protocol versions by an intermediary.

My gut feeling is that the best long term move would be to move to UTF-8
and forget about 2047 and accept that some existing things MAY break,
BUT as you say it can not be proved to be completely without problems
for existing implementations. In fact it's very likely to cause problems
in some areas:
  - Authentication (RFC2617, here ISO-8859-1 is actively supported
today, but not really sufficient)
  - Cookie, for applications using/setting cookies in both client and
server contexts (not just echoing what you got)

The business of relying on RFC2047 encoding or similar "obfuscation" is
quite likely to get more bad implementations than UTF-8, and the risk of
security implications at the protocol level due to mismatches between
ISO-8859-1 / UTF-8 expectations is pretty minimal.

Related to Cookie it may be worth mentioning that RFC2965
(Cookie/Set-Cookie2) defines that the human visible attribute (Comment)
must have it's value encoded in UTF-8, within HTTP...

For now I think the only possible outcome is to keep what we have;
ISO-8859-1 as default, but clarifying that intermediaries should handle
them as 8-bit ASCII strings and consider the C0 set (0x80-0x9F) as just
another set of octets of the string (not as control characters or
invalid) and a note that future headers MAY be seen using UTF-8
encoding.

Switching to UTF-8 in general is possible, but may require a new header
declaring that this message is sent using UTF-8 and is outside the scope
of RFC2616bis until there is a strong requirement to address I18N to
advance as standard. But switching to UTF-8 is imho most likely the most
sane way of addressing I18N in both 2616 and 2617, and least likely of
causing long-term interop issues.

Regards
Henrik

Received on Monday, 31 March 2008 17:31:13 UTC