Re: Round 3: moving HTTP 1.0 to informational from Roy T. Fielding on 1996-02-09 (ietf-http-wg@w3.org from January to March 1996)

From: Roy T. Fielding <fielding@avron.ICS.UCI.EDU>
Date: Fri, 09 Feb 1996 02:09:45 -0800
To: Larry Masinter <masinter@parc.xerox.com>
Cc: http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
Message-Id: <9602090209.aa29763@paris.ics.uci.edu>
> How about:
> 
> # This specification describes those features that seem to be
> # consistently implemented in most HTTP/1.0 clients and servers.
> 
> This removes the word 'approximate', and substitutes the requirement
> that the feature be 'found' to a more appropriate constraint of
> 'consistent implementation', and restricts the domain to clients and
> servers (e.g., omitting proxies.)

That would be fine.

> ================================================================
> Roy:
>> We should also add:
>>                               Recipients must ignore any media type
>> parameters whose names they do not recognize.
> 
> Could you explain what you mean by "ignore"?  If a recipient merely
> stores the entity and then regurgitates it later, for example, it
> should not discard media type parameters that it does not recognize.
> On the other hand, if you mean 'do not process' by 'ignore' then
> perhaps you want another word?  "Leave unmolested", "must not modify"?

I couldn't think of a better word for "treat the media type as if the
unrecognized parameter and its value were not present" -- maybe we
should just add that.

> Aren't we just better off not adding this?

I think it would help -- I got it from Ned's additions to MIME-IMB.

> ================================================================
> Roy:
> 
>>   In addition, if the text media is represented in a character
>>   set which does not use octets 13 and 10 for CR and LF respectively, as
>>   is the case for some multi-byte character sets, HTTP allows the use
>>   of whatever octet sequences are defined by that character set to
>>   represent the equivalent of CR and LF for line breaks.  It is
>>   assumed that any recipient capable of using such a character set
>>   will know the appropriate octet sequence for representing line
>>   breaks within that character set. 
> 
> which is contentious and does not represent current practice, as far
> as I can see. I've found sites that do UTF-8, Shift-JIS, EUC, etc.
> but have yet to find a site that does UCS-2; I've found a browser that
> does UCS-2 but it hardly represents a feature that is consistently
> implemented.

Well, it is certainly contentious.  The problem is that the new MIME
drafts specifically forbid the use of those character sets in e-mail,
whereas we have no intention (so far) of forbidding the use of UCS-2
in HTTP text media -- that was made quite clear ages ago on the list.

I thought that both Gavin Nicol and Glenn Adams had UCS-2 capable
servers or clients.  Perhaps they could enlighten us?

Personally, I think the only safe way around this issue is to define
a new major type called "itext" which does not have the rigid interoperability
requirements of "text", and yet is not as useless as "application".
However, that is out of scope.

> While I think this is an important point to deal with, I'd like to see
> the HTTP/1.0 draft proceed without trying to untie this particular
> knot. So, I would like to leave this out.

I consider removing it to be more controversial, as it unnecessarily
restricts current practice if we follow the new MIME drafts.  However,
I'll let it go if the others don't care -- my primary concern was that
people could not see what was being removed.

> ================================================================
> draft:
>> Media types of "text/*" are defined to have a default charset parameter of
>> "US-ASCII", and that other charset parameters should be labelled. In
>> practice, HTTP servers frequently send text data without a charset
>> parameter, and expect clients to guess the character set of the result.
>> This has caused a great deal of confusion and lack of interoperability in
>> HTTP 1.0 clients and servers.
> 
> Roy:
>> This is incorrect and not representative of current practice OR recommended
>> practice.
> 
> I will stand by the assertion that as far as I can tell, the first two
> sentences correctly describe current practice.

I don't see how.  All WWW software defaults to ISO-8859-1 as per the
original design of the Web.  That is true of libwww, libwww-perl, the
Python libraries, Mosaic, NCSA httpd, Apache httpd, Spyglass Mosaic,
MS Internet Explorer, and Netscape Navigator.  Only recently (within
the past six months) have people started adding config options, and
even those default to ISO-8859-1.  It has been in the HTTP spec since
TimBL's original version.

> I'm not sure about
> "great deal" in the third sentence, though. I agree completely that it
> is not recommended practice.  I'm not sure what (else) you mean by
> "this is incorrect". I suppose that the wording leaves out the
> recommendation that "ISO-8859-1" is a good first guess, and that as
> such your revised wording gives more advice.

Because the default is ISO-8859-1 -- there is no "guessing" involved.

> Roy:
>>   The "charset" parameter is used with some media types to define the
>>   character set (Section 3.4) of the data.  When no explicit charset
>>   parameter is provided by the sender, media subtypes of the "text"
>>   subtype are defined to have a default charset value of "ISO-8859-1"
>>   when received via HTTP.  
> 
>>      Note: Some HTTP user agents provide a configuration option to
>>      allow the user to change the default interpretation of the media
>>      type character set when no charset parameter is given.  However,
>>      use of such options is not consistent and leads to poor
>>      interoperability across open systems.
> 
> Even though they're defined to have this default charset value,
> current practice is that most servers just send what they have. The
> use of the client options isn't what leads to poor interoperability;
> the clients were just trying to cope with the inconsistent servers!

I disagree -- all current practice that is not known to be broken
is saying charset="ISO-8859-1" if no charset is present.  That is
in the definition of the HTTP protocol.

> If you want to try to wordsmith this, be sure to write what you think
> current practice is as well as recommended practice. Personally, I
> still prefer it the way it was, with perhaps an addition that it is
> recommended that servers supply ISO-8859-1 and that clients 'guess'
> that format first, even though many other character encodings seem to
> be used.

Broken applications do not have equal footing, even in an Informational RFC.
The purpose of the spec is still to define the HTTP/1.0 protocol, even if it
isn't to be a standard.

> ================================================================
> Roy:
>> That should be "Implementors of HTTP origin servers should ..."
> 
> If you have a server that's both a proxy and an origin,
> should you not also restrict it? If you have a server that meant to be
> a proxy but can also be used as an origin etc. etc.?

Yes, but any proxy capable of being an origin *is* an origin server,
by definition, so the above covers them.

> ================================================================
> draft:
>> example, Unix, Microsoft Windows, and other operating systems use ".."
> 
> Roy:
>> Ummm, unless we want to include the TM disclaimer, that should be
>> "example, some operating systems". [It is okay by me to include the disclaimer]
> 
> I've scanned current RFCs for instances of Unix and Windows and found
> several without TM disclaimers. Why do you believe this is necessary?

Only if the owners of those trademarks insist that we do, which is
one of the requirements for holding a trademark.  I just want to avoid
a last-minute change (or, worse, a post-publication change).

> ================================================================
>>> In RFC 1521, the header fields in multipart body-parts are generally
>>> ignored
> 
>> ... unless the field-name begins with "Content-".
> 
> Could we just say 'most header fields' instead of 'the header fields'?

I think people should know that Content-whatever is safe -- perhaps
that will encourage them to choose a Content-whatever name when it is
appropriate.


 ...Roy T. Fielding
    Department of Information & Computer Science    (fielding@ics.uci.edu)
    University of California, Irvine, CA 92717-3425    fax:+1(714)824-4056
    http://www.ics.uci.edu/~fielding/
Received on Friday, 9 February 1996 02:15:59 UTC