- From: Roy T. Fielding <fielding@avron.ICS.UCI.EDU>
- Date: Thu, 01 Dec 1994 17:58:13 -0800
- To: Marc VanHeyningen <mvanheyn@cs.indiana.edu>
- Cc: http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
Marc VanHeyningen writes:
>[I wrote:]
>>The specified behavior will be "no canonical encoding of the object-body
>>is required before network transfer via HTTP, though gateways may need
>>to perform such canonical encoding before forwarding a message via a
>>different protocol. However, servers may wish to perform such encoding
>>(i.e. to compensate for unusual document structures), and
>>may do so at their discretion."
>
> I must not be understanding what you're saying correctly. Why is
> canonical encoding unnecessary? Do you really mean that any server,
> on any architecture, can (for example) transmit text files using
> whatever its local system convention for line breaks might happen to
> be (CR, LF, CRLF, whatever) without standardizing it? How can we be
> passing local forms around between different machines and expect it to
> work reliably?
Yes. Because (except in very few circumstances) it does work reliably.
I do not know of any server that does canonicalization. Requiring ALL
servers to parse-and-replace, character-by-character, all text/* content
types is hideously inefficient and not appropriate for HTTP. Instead,
that decision (of whether or not its needed) should be left up to the
individual platform implementation.
> Yes, I know that pretty much all existing servers run under UNIX and
> just blindly send the UNIX line break without making any effort to
> normalize it, but the spec should document correct behavior, with
> existing behavior mentioned, as it currently is, in the appendix. The
> current document is a little strange, in that the appendix recommends
> assuming any newline is a line break to tolerate bad servers/clients,
> but nowhere in the document does it seem to say what the *correct*
> behavior is, or why those programs are bad. I believe strongly that
> the correct behavior is to send things only in canonical form.
The alternative is to specify that lines end in LF, and I don't like
that any better. However, I agree that something should be said in the
spec regarding canonicalization.
> Actually, after thinking about this a little more, I realized the MIME
> encoding model isn't adequate, because HTTP adds a new layer of
> encoding ("Content-Encoding: x-gzip" or the like) and the spec needs
> to explicitly state when that encoding gets done and undone relative
> to canonicalization (if we include that) and CTE. I think the model
> should specify that content-encoding happens after canonicalization
> but before the CTE, if any (there should be none, of course, normally)
> is applied.
Yes, that should be clarified.
> (Actually, I wouldn't object to outright prohibiting any CTE other
> than clear ones like 7bit, 8bit and binary, but maybe there are
> reasons to allow q-p and base64.)
Clients may wish to support others in order to post newsgroup messages
through a proxy, but that is the only case I can think of.
>>> As near as I can tell, the spec constrains all header values to be
>>> US-ASCII, meaning nothing that is not US-ASCII may be contained in
>>> them. We might consider permitting non-US-ASCII information in at
>>> least some headers, probably using RFC 1522's model.
>>
>>I'd rather not. If there is a perceived need for non-US-ASCII information
>>in header field value text and comments (I don't see any), then I think
>>they should only be encoded by gateways during export.
>
> I don't see an immediate overwhelming need, but it's there. Plenty of
> people who have names with non-ASCII characters in them like to
> include those names in a From: header, for example. It's not
> necessarily urgent, but I think it will get used anyway in areas like
> From: that are only used for logging and shouldn't break anything.
> It's not as though it would be legal in URLs or Content-Types or
> anything heavily interpreted, and I don't think it would break any
> existing software.
>
> I don't see how having them only be encoded by gateways will suffice.
> How would one represent non-US-ASCII information in a header?
> Specifically, how would one indicate what character set is being
> employed, if not by using MIME part 2? (Yes, it's kind of an ugly
> wheel, but it works and it's backward-compatible.)
I suppose it can be allowed for *text and *ctext.
> Oh, minor nit: In the date section, the grammar makes the
> Day-of-the-week component of mandatory. I believe it should be made
> optional, at least in 822/1123 style, since that's how it is in 822
> (not to mention there's no good reason for it to be there, since it
> doesn't provide any machine-useful information and won't normally be
> viewed directly by a human.)
I do not believe in optional portions of fixed-length fields -- they
make parsing things an absolute nightmare. Besides, this format has
been in practice for a year now and appears to be the best for
interfacing with SMTP and NNTP gateways.
......Roy Fielding ICS Grad Student, University of California, Irvine USA
<fielding@ics.uci.edu>
<URL:http://www.ics.uci.edu/dir/grad/Software/fielding>
Received on Thursday, 1 December 1994 18:06:41 UTC