- From: Albert Lunde <Albert-Lunde@nwu.edu>
- Date: Fri, 9 Feb 1996 09:03:11 -0600 (CST)
- To: http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
> > Roy: > > > >> In addition, if the text media is represented in a character > >> set which does not use octets 13 and 10 for CR and LF respectively, as > >> is the case for some multi-byte character sets, HTTP allows the use > >> of whatever octet sequences are defined by that character set to > >> represent the equivalent of CR and LF for line breaks. It is > >> assumed that any recipient capable of using such a character set > >> will know the appropriate octet sequence for representing line > >> breaks within that character set. > > > > which is contentious and does not represent current practice, as far > > as I can see. I've found sites that do UTF-8, Shift-JIS, EUC, etc. > > but have yet to find a site that does UCS-2; I've found a browser that > > does UCS-2 but it hardly represents a feature that is consistently > > implemented. > > Well, it is certainly contentious. The problem is that the new MIME > drafts specifically forbid the use of those character sets in e-mail, > whereas we have no intention (so far) of forbidding the use of UCS-2 > in HTTP text media -- that was made quite clear ages ago on the list. [...] > > While I think this is an important point to deal with, I'd like to see > > the HTTP/1.0 draft proceed without trying to untie this particular > > knot. So, I would like to leave this out. > > I consider removing it to be more controversial, as it unnecessarily > restricts current practice if we follow the new MIME drafts. However, > I'll let it go if the others don't care -- my primary concern was that > people could not see what was being removed. I haven't read the new MIME drafts, but it looks to me that we can't leave it out entirely. The flexible end-of-line treatment is one of the established features of HTTP. I agree that we don't want to take the course of forbidding encodings that don't look enough like US-ASCII. How about adding a sentence "Support for such character encodings is not yet widely implemented in WWW software; but this specification should be understood to allow their use." > > ================================================================ > > draft: > >> Media types of "text/*" are defined to have a default charset parameter of > >> "US-ASCII", and that other charset parameters should be labelled. In > >> practice, HTTP servers frequently send text data without a charset > >> parameter, and expect clients to guess the character set of the result. > >> This has caused a great deal of confusion and lack of interoperability in > >> HTTP 1.0 clients and servers. > > > > Roy: > >> This is incorrect and not representative of current practice OR recommended > >> practice. > > > > I will stand by the assertion that as far as I can tell, the first two > > sentences correctly describe current practice. > > I don't see how. All WWW software defaults to ISO-8859-1 as per the > original design of the Web. That is true of libwww, libwww-perl, the > Python libraries, Mosaic, NCSA httpd, Apache httpd, Spyglass Mosaic, > MS Internet Explorer, and Netscape Navigator. Only recently (within > the past six months) have people started adding config options, and > even those default to ISO-8859-1. It has been in the HTTP spec since > TimBL's original version. I agree. It would be a major rewriting of history to claim that the default was anything but ISO-8859-1 (even though this is a deviation from MIME e-mail practice). It's also true that sending other character sets unlabeled has caused confusion that current practice has not yet resolved (else my copy of Netscape wouldn't need frequent shifting between the Latin1 and Japanese "autodetect" modes. ;) Part of the problem of is the matter of practice catching up with even the most drafty specs.
Received on Friday, 9 February 1996 07:05:32 UTC