Re: Warnings, RFC 1522, and ISO-8859-1 from Martin J. Duerst on 1996-12-16 (ietf-http-wg@w3.org from October to December 1996)

From: Martin J. Duerst <mduerst@ifi.unizh.ch>
Date: Mon, 16 Dec 1996 22:09:38 +0100 (MET)
To: Koen Holtman <koen@win.tue.nl>
Cc: http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
Message-Id: <Pine.SUN.3.95.961216213450.242W-100000@enoshima>
Hello Koen,

Many thanks for your information.

> Martin J. Duerst:
> >
> [...]
> >Now back to the MAIN POINT: Can anybody explain to me why
> >ISO-8859-1 was choosen as a default for TEXT in headers
> >and warnings? 
> 
> The TEXT encoding was US-ASCII in HTTP/1.0 (RFC1945),

Not true. RFC1945 explicitly allows octets from character sets
other than US-ASCII (which means octets with the 8th bit set).
It allows the recipient to assume that these represent
ISO-8859-1 characters, but it leaves the possibility open for
it to be something else. Here is the full quote:

   The TEXT rule is only used for descriptive field contents and values
   that are not intended to be interpreted by the message parser. Words
   of *TEXT may contain octets from character sets other than US-ASCII.

       TEXT           = <any OCTET except CTLs,
                        but including LWS>

   Recipients of header field TEXT containing octets outside the US-
   ASCII character set may assume that they represent ISO-8859-1
   characters.

> but it got changed
> into ISO-8859-1 for HTTP/1.1 because HTML uses ISO-8859-1.  

HTML 2.0, as of Nov. 1995 (RFC1866) already contained very
clear language that HTML will move to ISO-10646. Also, there
is a big difference between entity bodies (where the agreement
is that "charset" should be labelled as far as legacy browsers
don't prevent that) and headers (where labeling only makes
sense for 7-bit email, but is not necessary with UTF-8).


> >Given the recommendations of the IAB
> >charset workshop (draft-weider-iab-char-wrkshop-00.txt),
> >which repeatedly mentionnes UTF-8, this seems like a
> >rather antiquated choice.
> 
> The basic choice to replace US-ASCII by ISO-8859-1 was made in April.  See
> http://www.ics.uci.edu/pub/ietf/http/hypermail/1996q2/0062.html .

Sorry, but 1996q2/0062.html is about Accept-Charset. The question there
was whether it could be assumed that clients in general would be able
to render ISO-8859-1, so that clients would not be required to
include ISO-8859-1 in their Accept-Charset line. There was more
discussion on this later, but I think the decision taken was reasonable.

The above message does not mention TEXT at all.


>The idea
> was to sync HTTP with the defaults in HTML, we did not have any i18n
> considerations in mind.

This sounds to me as if somebody were saying "We were discussing
passwords - We did not have any security considerations in mind".


> As for the Warning header: we did not spend days discussing how to
> internationalise the warning text field, this was just a micro-decision made
> by one of the editors along the way. Maybe it was not an optimal decision,
> but we did not have the time to spend days optimising every micro-decision.

There is really no need to discuss such things for days. The only
requirement is to make the right decision.


> > On the other side, UTF-8
> >is extremely suited for the purpose: It covers all the
> >characters of the world, is reasonably compact, and
> >works together smoothlessly with ASCII.
> 
> Sounds good,

I can assure you: It is as good as it sounds.


>but you should have told us in April/May, when we were
> finishing the draft.  Maybe we could use UTF-8 in HTTP/1.2 or HTTP/2.0.
> There is always a next version.

The sooner we change, the better. The longer we wait, the more
programs there will be that depend on it, and the more difficult
it will be to change it.


> [...]
> 
> >Procedural Concerns
> >-------------------
> >The current HTTP 1.1 draft is beyond last call, waiting for
> >becomming an RFC. I do not know whether last minute changes
> >can or should be made, 
> 
> If I understand the IETF process correctly, only very serious bugs can be
> fixed at this point.
> 
> We cannot reverse decisions like the default charset without doing a last
> call again, which would delay the draft by many months.  And we don't want
> any delay, it is generally thought that 1.1 is dangerously late already.

I definitely don't want to delay the draft. But if we agree on
the direction to go in this issue, we can issue a small draft
(e.g. Encoding of Headers in HTTP) to clear up the issue.
This should neither delay the IETF process, nor will it delay
implementations to wait for HTTP 1.2 to do the right thing.


Regards,	Martin.
Received on Monday, 16 December 1996 13:11:35 UTC