- From: Roy T. Fielding <fielding@avron.ICS.UCI.EDU>
- Date: Thu, 01 Dec 1994 17:58:13 -0800
- To: Marc VanHeyningen <mvanheyn@cs.indiana.edu>
- Cc: http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
Marc VanHeyningen writes: >[I wrote:] >>The specified behavior will be "no canonical encoding of the object-body >>is required before network transfer via HTTP, though gateways may need >>to perform such canonical encoding before forwarding a message via a >>different protocol. However, servers may wish to perform such encoding >>(i.e. to compensate for unusual document structures), and >>may do so at their discretion." > > I must not be understanding what you're saying correctly. Why is > canonical encoding unnecessary? Do you really mean that any server, > on any architecture, can (for example) transmit text files using > whatever its local system convention for line breaks might happen to > be (CR, LF, CRLF, whatever) without standardizing it? How can we be > passing local forms around between different machines and expect it to > work reliably? Yes. Because (except in very few circumstances) it does work reliably. I do not know of any server that does canonicalization. Requiring ALL servers to parse-and-replace, character-by-character, all text/* content types is hideously inefficient and not appropriate for HTTP. Instead, that decision (of whether or not its needed) should be left up to the individual platform implementation. > Yes, I know that pretty much all existing servers run under UNIX and > just blindly send the UNIX line break without making any effort to > normalize it, but the spec should document correct behavior, with > existing behavior mentioned, as it currently is, in the appendix. The > current document is a little strange, in that the appendix recommends > assuming any newline is a line break to tolerate bad servers/clients, > but nowhere in the document does it seem to say what the *correct* > behavior is, or why those programs are bad. I believe strongly that > the correct behavior is to send things only in canonical form. The alternative is to specify that lines end in LF, and I don't like that any better. However, I agree that something should be said in the spec regarding canonicalization. > Actually, after thinking about this a little more, I realized the MIME > encoding model isn't adequate, because HTTP adds a new layer of > encoding ("Content-Encoding: x-gzip" or the like) and the spec needs > to explicitly state when that encoding gets done and undone relative > to canonicalization (if we include that) and CTE. I think the model > should specify that content-encoding happens after canonicalization > but before the CTE, if any (there should be none, of course, normally) > is applied. Yes, that should be clarified. > (Actually, I wouldn't object to outright prohibiting any CTE other > than clear ones like 7bit, 8bit and binary, but maybe there are > reasons to allow q-p and base64.) Clients may wish to support others in order to post newsgroup messages through a proxy, but that is the only case I can think of. >>> As near as I can tell, the spec constrains all header values to be >>> US-ASCII, meaning nothing that is not US-ASCII may be contained in >>> them. We might consider permitting non-US-ASCII information in at >>> least some headers, probably using RFC 1522's model. >> >>I'd rather not. If there is a perceived need for non-US-ASCII information >>in header field value text and comments (I don't see any), then I think >>they should only be encoded by gateways during export. > > I don't see an immediate overwhelming need, but it's there. Plenty of > people who have names with non-ASCII characters in them like to > include those names in a From: header, for example. It's not > necessarily urgent, but I think it will get used anyway in areas like > From: that are only used for logging and shouldn't break anything. > It's not as though it would be legal in URLs or Content-Types or > anything heavily interpreted, and I don't think it would break any > existing software. > > I don't see how having them only be encoded by gateways will suffice. > How would one represent non-US-ASCII information in a header? > Specifically, how would one indicate what character set is being > employed, if not by using MIME part 2? (Yes, it's kind of an ugly > wheel, but it works and it's backward-compatible.) I suppose it can be allowed for *text and *ctext. > Oh, minor nit: In the date section, the grammar makes the > Day-of-the-week component of mandatory. I believe it should be made > optional, at least in 822/1123 style, since that's how it is in 822 > (not to mention there's no good reason for it to be there, since it > doesn't provide any machine-useful information and won't normally be > viewed directly by a human.) I do not believe in optional portions of fixed-length fields -- they make parsing things an absolute nightmare. Besides, this format has been in practice for a year now and appears to be the best for interfacing with SMTP and NNTP gateways. ......Roy Fielding ICS Grad Student, University of California, Irvine USA <fielding@ics.uci.edu> <URL:http://www.ics.uci.edu/dir/grad/Software/fielding>
Received on Thursday, 1 December 1994 18:06:41 UTC