Re: Round 3: moving HTTP 1.0 to informational

> Are we converging?

As far as I know.

> I don't like 'treat the media type as if the unrecognized parameter
> and its value were not present' any better, for the same reason: a
> forwarding agent probably shouldn't toss unrecognized parameters.  I
> don't think it's really clear what MIME-IMB means by it, and I don't
> see what it adds.

It is a "you should've figured this one out already, but too many
implementors have already screwed up on it" kind of sentence.  Given
that Mosaic was the worst culprit, I'd like to see it in the spec. 
How about:

    Upon receipt of a media type with an unrecognized parameter,
    a user agent should treat the media type as if the unrecognized
    parameter and its value were not present.

> ================================================================
> Re the ASCII vs 8859-1 default: I think we've just been browsing
> different pages. It seems the problem was the escape of Mosaic-l10n,
> and the use of different font code sets for Russian, Greek, etc.

Yes, I know about the problem, but we went to great lengths to
convince the Mosaic-l10n crowd and all other developers that an
explicit charset parameter is the right way to do what they were
doing. They agreed, and now current practice is that no charset again
implies ISO-8859-1 and, aside from closed systems where some other
default is known to the user, a charset is needed to switch character sets.

>> I don't see how.  All WWW software defaults to ISO-8859-1 as per the
>> original design of the Web.
> 
> ?? Mosaic-l10n was pretty popular.

Yes, but not as popular as all the systems that assume ISO-8859-1 and
do not allow any changing of that default.

>> That is true of libwww, libwww-perl, the
>> Python libraries, Mosaic, NCSA httpd, Apache httpd, Spyglass Mosaic,
>> MS Internet Explorer, and Netscape Navigator.
> 
> I was thinking of 'servers' not 'clients'. Yes, most ISO-8859-1
> clients assume ISO-8859-1.

When the Apache server sends text/html, it is intending to say
text/html;charset="iso-8859-1".  Users are capable of changing the
charset parameter on documents served by Apache.  What spawned Mosaic-l10n
was the belief that it was easier to hack the clients to guess
a charset than it was to both fix the clients to recognize charset and
fix the older NCSA server (which had no knowledge of media type parameters
of any kind) to send the charset parameter.  That reasoning is no longer
valid.  Since we have fixed both the clients and the servers to do the
right thing, there's no point in saying that doing the wrong thing is
current practice.

>> I disagree -- all current practice that is not known to be broken
>> is saying charset="ISO-8859-1" if no charset is present.  That is
>> in the definition of the HTTP protocol.
> 
> It's really hard to find a web server there that *doesn't* use
> whatever-ya-got as the character encoding.  If it were just a few here
> and there, I'd go along with you, but when it's all over, it's hard to
> swallow saying "oh, they're just broken" and have it apply to
> www.*.ru, www.*.jp, www.*.gr, www.*.kr, www.*.cn etc.
> 
> How about:
> 
> #   The "charset" parameter is used with some media types to define
> #   the character set (Section 3.4) of the data.  When no explicit
> #   charset parameter is provided by the sender, media subtypes of the
> #   "text" subtype are defined to have a default charset value of
> #   "ISO-8859-1" when received via HTTP. However, currently many web
> #   servers ignore have ignored this specification, and provide data
> #   using other charsets but without proper labelling. To compensate
> #   for this, some HTTP user agents provide a configuration option to
> #   allow the user to change the default interpretation of the media
> #   type character set when no charset parameter is given. This
> #   situation reduces interoperability. It is recommended servers that
> #   provide text in character streams other than ISO-8859-1 should
> #   label the data appropriately.
> 
> This both promotes the 'recommended' behavior and also tells the
> situation. 

No, it mixes things up.  If you want to say that, the correct thing to
do is:

   The "charset" parameter is used with some media types to define
   the character set (Section 3.4) of the data.  When no explicit
   charset parameter is provided by the sender, media subtypes of the
   "text" subtype are defined to have a default charset value of
   "ISO-8859-1" when received via HTTP.  Data in character sets other
   than "ISO-8859-1" or its subsets must be labelled with an appropriate
   charset value in order to be consistently interpreted by user agents.

      Note: Many current HTTP servers provide data using charsets
      other than "ISO-8859-1" without proper labelling.  This
      situation reduces interoperability and is not recommended.
      To compensate for this, some HTTP user agents provide a
      configuration option to allow the user to change the default
      interpretation of the media type character set when no
      charset parameter is given.

Which makes it clear what the protocol is versus what implemetation
kludges are used.

> ================================================================
> 
> I don't really care about most of the rest of the issues. I still
> don't know why you want to say "origin server" when "server" will do,

Because the security note doesn't apply to all servers -- only origins.

> or [tm] on Unix and Windows when no one else does, but I don't care
> much. 

Well, I don't care about that either -- I just wanted to bring it to
your attention.

> And I'll go along with calling out "content-" headers specially.

> Are we converging?

Yep.

 ...Roy T. Fielding
    Department of Information & Computer Science    (fielding@ics.uci.edu)
    University of California, Irvine, CA 92717-3425    fax:+1(714)824-4056
    http://www.ics.uci.edu/~fielding/

Received on Saturday, 10 February 1996 22:18:11 UTC