Re: draft-klensin-net-utf8-03.txt from Roy T. Fielding on 2007-05-01 (public-ietf-w3c@w3.org from May 2007)

From: Roy T. Fielding <fielding@gbiv.com>
Date: Tue, 1 May 2007 14:07:41 -0700
To: John C Klensin <klensin@jck.com>
Cc: Thomas Roessler <tlr@w3.org>, public-ietf-w3c@w3.org
Message-Id: <0D6A4C90-9BF7-4293-BBBE-1677FC4D8C3B@gbiv.com>
On May 1, 2007, at 9:15 AM, John C Klensin wrote:
> The one specific issue that "folks on the W3C side" should be aware  
> of is that this specifies a strong requirement for CRLF line- 
> endings.  It has been suggested that, since HTML can accept  
> variations on that theme, this would should as well.   So far, the  
> response has been that the flexibility for what goes over the wire  
> causes trouble that is all out of proportion to its convenience.
>
> Additional insights on that subject (or anything else in the draft)  
> would be welcome.

Well, it really depends on whether you expect the standard to be
followed to the letter, or simply followed with a general expectation
that any line ending is handled in a robust way.

    At a very high level, the concept was that a system could use
    whatever character coding and line representations were appropriate
    locally, but text transmitted over the network as text must conform
    to the single "network virtual terminal" convention.  Virtually all
    early Internet protocols that presume transfer of "text" assume this
    virtual terminal model, although different ones assume or limit  
it in
    different ways.  Telnet, the command stream and ASCII Type in FTP
    [RFC0542], the message stream in SMTP transfer [RFC2821], and the
    strings passed to finger [RFC0742] and whois [RFC0954] are the
    classic examples.  More recently, HTTP [RFC2068] follows the same
    general model but permits 8 bit data and leaves the line end  
sequence
    unspecified (the latter has been the source of a significant number
    of problems).

HTTP presents a fundamentally different application design problem,
particularly on the server side.  The notion that accepting CR, LF,
and CRLF as indicating line endings "has been the source of a  
significant
number of problems" is misleading.  What problems?  The choice made
by FTP was shown to be terribly wrong in practice -- the ascii/binary
mode switch was an interoperability disaster.  In contrast, Web browsers
have been significantly more successful in deployed practice.

The choice for HTTP was very simple.  We could require every server to
post-process the content of every response to enforce line endings
or we could support stored metadata like digital signatures.  It is
not possible to support both outside of the Microsoft platform.
Since every single client is compelled to accept all three major
line endings anyway, just to deal with variations on the operating
systems on which they are deployed, and it is far more efficient for
servers to assume that the publisher provides the text in the format
that they intend to distribute, the right choice was clear.  We chose
to specify in HTTP the way that interoperable clients and servers
actually worked, rather than demanding a single standard be
recognized and then ignoring that standard in practice.

Not a single HTTP developer has requested that standard be
changed since 1994 or so, so I have no idea what "problems" are
being described.  It is trivial for text formats to be transformed
to the native filesystem line endings when they are exported by the
recipient.  That has to be done anyway for non-Microsoft platforms
even if the CRLF line ending is used.

HTTP's rules apply to all text types that are transferred via HTTP,
regardless of what it says in the media type spec.  Email should
continue to use CRLF as its one standard, at least until SMTP
is replaced by some future protocol.  I think that the net-utf8
standard should describe the format as it will be used in practice,
which means that it should specify being conservative in what is
sent (CRLF) and robust in what is received (CR|CRLF|LF).  These days,
media types are used to define the format as used on the local
filesystem, not just as it is transmitted on the Internet, so the
reality is that line endings will conform to the local operating
system's editing standards and not to any Internet specification.


Cheers,

Roy T. Fielding                            <http://roy.gbiv.com/>
Chief Scientist, Day Software              <http://www.day.com/>
Received on Tuesday, 1 May 2007 21:08:31 UTC