- From: Roy T. Fielding <fielding@gbiv.com>
- Date: Tue, 1 May 2007 14:07:41 -0700
- To: John C Klensin <klensin@jck.com>
- Cc: Thomas Roessler <tlr@w3.org>, public-ietf-w3c@w3.org
On May 1, 2007, at 9:15 AM, John C Klensin wrote: > The one specific issue that "folks on the W3C side" should be aware > of is that this specifies a strong requirement for CRLF line- > endings. It has been suggested that, since HTML can accept > variations on that theme, this would should as well. So far, the > response has been that the flexibility for what goes over the wire > causes trouble that is all out of proportion to its convenience. > > Additional insights on that subject (or anything else in the draft) > would be welcome. Well, it really depends on whether you expect the standard to be followed to the letter, or simply followed with a general expectation that any line ending is handled in a robust way. At a very high level, the concept was that a system could use whatever character coding and line representations were appropriate locally, but text transmitted over the network as text must conform to the single "network virtual terminal" convention. Virtually all early Internet protocols that presume transfer of "text" assume this virtual terminal model, although different ones assume or limit it in different ways. Telnet, the command stream and ASCII Type in FTP [RFC0542], the message stream in SMTP transfer [RFC2821], and the strings passed to finger [RFC0742] and whois [RFC0954] are the classic examples. More recently, HTTP [RFC2068] follows the same general model but permits 8 bit data and leaves the line end sequence unspecified (the latter has been the source of a significant number of problems). HTTP presents a fundamentally different application design problem, particularly on the server side. The notion that accepting CR, LF, and CRLF as indicating line endings "has been the source of a significant number of problems" is misleading. What problems? The choice made by FTP was shown to be terribly wrong in practice -- the ascii/binary mode switch was an interoperability disaster. In contrast, Web browsers have been significantly more successful in deployed practice. The choice for HTTP was very simple. We could require every server to post-process the content of every response to enforce line endings or we could support stored metadata like digital signatures. It is not possible to support both outside of the Microsoft platform. Since every single client is compelled to accept all three major line endings anyway, just to deal with variations on the operating systems on which they are deployed, and it is far more efficient for servers to assume that the publisher provides the text in the format that they intend to distribute, the right choice was clear. We chose to specify in HTTP the way that interoperable clients and servers actually worked, rather than demanding a single standard be recognized and then ignoring that standard in practice. Not a single HTTP developer has requested that standard be changed since 1994 or so, so I have no idea what "problems" are being described. It is trivial for text formats to be transformed to the native filesystem line endings when they are exported by the recipient. That has to be done anyway for non-Microsoft platforms even if the CRLF line ending is used. HTTP's rules apply to all text types that are transferred via HTTP, regardless of what it says in the media type spec. Email should continue to use CRLF as its one standard, at least until SMTP is replaced by some future protocol. I think that the net-utf8 standard should describe the format as it will be used in practice, which means that it should specify being conservative in what is sent (CRLF) and robust in what is received (CR|CRLF|LF). These days, media types are used to define the format as used on the local filesystem, not just as it is transmitted on the Internet, so the reality is that line endings will conform to the local operating system's editing standards and not to any Internet specification. Cheers, Roy T. Fielding <http://roy.gbiv.com/> Chief Scientist, Day Software <http://www.day.com/>
Received on Tuesday, 1 May 2007 21:08:31 UTC