Re: Unknown text/* subtypes from Frank Ellermann on 2007-12-18 (www-archive@w3.org from December 2007)

From: Frank Ellermann <nobody@xyzzy.claranet.de>
Date: Tue, 18 Dec 2007 19:17:01 +0100
To: www-archive@w3.org
Cc: ietf-types@alvestrand.no
Message-ID: <fk92rh$65s$1@ger.gmane.org>

Julian Reschke wrote:

> It would be nice if somebody could provide some insight why this ever 
> made it into HTTP. Was that just an attempt to allow text/html encoded 
> in latin1 to be served without charset parameter?

Some parts of this puzzle:  RFC 2070 introduced an "ideally anything
is Unicode" concept, later adopted by HTML 4+, XHTML 1+, and XML 1+.
AFAIK HTML 3.2 and maybe also HTML 3 still didn't have this feature.

As far as RFC numbers mean something 2070 was published "after" 2068,
both say January 1997, and "the law" 2277 was clearly a year later.

RFC 2068 (HTTP/1.1) was the successor of 1945 (HTTP/1.0, May 1996),
2070 (HTML i18n) was the successor of 1866 (HTML 2, November 1995).

Tim Berners-Lee, one co-author of RFC 1866 and 1945, wrote in 1866:

| NOTE - To support non-western writing systems, a larger character
| repertoire will be specified in a future version of HTML. The
| document character set will be [ISO-10646], or some subset that
| agrees with [ISO-10646]; in particular, all numeric character
| references must use code positions assigned by [ISO-10646].

Speculation, in May 1996 it made sense that HTTP/1.0 can transport
HTML 2 "as is", default Latin-1, and it took Harald and Martin some
months to fix this in RFC 2070 and 2277, too late for RFC 2068, and
RFC 2616 simply inherited "default Latin-1" wholesale. 

 Frank

Received on Tuesday, 18 December 2007 19:20:31 UTC