- From: Marc VanHeyningen <mvanheyn@cs.indiana.edu>
- Date: Wed, 30 Nov 1994 09:05:05 -0500
- To: "Roy T. Fielding" <fielding@avron.ICS.UCI.EDU>
- Cc: http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
Thus wrote: "Roy T. Fielding" >Marc VanHeyningen writes: >> Rather egregiously missing is a reference to transmitting network >> objects in canonical form. Section 3.2 should mention this; a >> reference to the canonical encoding model in Appendix G of RFC 1521 >> (specifically step 2) probably should suffice. The only place this is >> hinted at is in the tolerance section of the appendices on tolerance >> of broken implementations, but the spec should explicitly say what the >> proper behavior is, just in case any servers every actually do that. :-) > >The specified behavior will be "no canonical encoding of the object-body >is required before network transfer via HTTP, though gateways may need >to perform such canonical encoding before forwarding a message via a >different protocol. However, servers may wish to perform such encoding >(i.e. to compensate for unusual document structures), and >may do so at their discretion." I must not be understanding what you're saying correctly. Why is canonical encoding unnecessary? Do you really mean that any server, on any architecture, can (for example) transmit text files using whatever its local system convention for line breaks might happen to be (CR, LF, CRLF, whatever) without standardizing it? How can we be passing local forms around between different machines and expect it to work reliably? Yes, I know that pretty much all existing servers run under UNIX and just blindly send the UNIX line break without making any effort to normalize it, but the spec should document correct behavior, with existing behavior mentioned, as it currently is, in the appendix. The current document is a little strange, in that the appendix recommends assuming any newline is a line break to tolerate bad servers/clients, but nowhere in the document does it seem to say what the *correct* behavior is, or why those programs are bad. I believe strongly that the correct behavior is to send things only in canonical form. Actually, after thinking about this a little more, I realized the MIME encoding model isn't adequate, because HTTP adds a new layer of encoding ("Content-Encoding: x-gzip" or the like) and the spec needs to explicitly state when that encoding gets done and undone relative to canonicalization (if we include that) and CTE. I think the model should specify that content-encoding happens after canonicalization but before the CTE, if any (there should be none, of course, normally) is applied. (Actually, I wouldn't object to outright prohibiting any CTE other than clear ones like 7bit, 8bit and binary, but maybe there are reasons to allow q-p and base64.) >> As near as I can tell, the spec constrains all header values to be >> US-ASCII, meaning nothing that is not US-ASCII may be contained in >> them. We might consider permitting non-US-ASCII information in at >> least some headers, probably using RFC 1522's model. > >I'd rather not. If there is a perceived need for non-US-ASCII information >in header field value text and comments (I don't see any), then I think >they should only be encoded by gateways during export. I don't see an immediate overwhelming need, but it's there. Plenty of people who have names with non-ASCII characters in them like to include those names in a From: header, for example. It's not necessarily urgent, but I think it will get used anyway in areas like From: that are only used for logging and shouldn't break anything. It's not as though it would be legal in URLs or Content-Types or anything heavily interpreted, and I don't think it would break any existing software. I don't see how having them only be encoded by gateways will suffice. How would one represent non-US-ASCII information in a header? Specifically, how would one indicate what character set is being employed, if not by using MIME part 2? (Yes, it's kind of an ugly wheel, but it works and it's backward-compatible.) Oh, minor nit: In the date section, the grammar makes the Day-of-the-week component of mandatory. I believe it should be made optional, at least in 822/1123 style, since that's how it is in 822 (not to mention there's no good reason for it to be there, since it doesn't provide any machine-useful information and won't normally be viewed directly by a human.) -- Marc VanHeyningen <URL:http://www.cs.indiana.edu/hyplan/mvanheyn.html>
Received on Wednesday, 30 November 1994 06:07:14 UTC