- From: Julian Reschke <julian.reschke@gmx.de>
- Date: Thu, 14 Feb 2008 14:48:20 +0100
- To: "Roy T. Fielding" <fielding@gbiv.com>
- CC: Mark Nottingham <mnot@mnot.net>, HTTP Working Group <ietf-http-wg@w3.org>
Roy T. Fielding wrote: > > On Feb 12, 2008, at 8:59 PM, Mark Nottingham wrote: >> Roy, if you disagree with consensus on this issue, please suggest >> specific text to replace Julian's work. > > It isn't consensus until the people who have to change their > implementations agree to do so. The change was applied in a way > that I did not anticipate, which made it a new requirement on > previously conforming implementations rather than a relaxation > of the existing requirements. The issue did not require that much. And I don't think we realized that we may be *adding* requirements; the goals were (IMHO) to reduce inconsistency of specs and to allow what UAs indeed do today. It seems the conclusion is that in this case these goals are contradictory. > http://www3.tools.ietf.org/wg/httpbis/trac/ticket/20 > > Here is the change that Julian made according to the issue: > > http://www3.tools.ietf.org/wg/httpbis/trac/changeset/209 ...which was what the issue resolution asked for; so the mistake happened earlier. In the meantime I have backed out the change (<http://www3.tools.ietf.org/wg/httpbis/trac/changeset/211>), and I'd propose to re-open the issue. > ... > And here is what I suggest for a rewrite, merging both of the above > sections under Media Types and inverting the "fantasy island" > requirements of the original text to what is permitted in HTTP > beyond the registration defaults of MIME. > > 2.3.1. Canonicalization and Text Media Types > > Internet media types are registered with a canonical form and > defaults for the optional parameter values. An ideal HTTP > entity-body would contain data formatted strictly according to that > canonical form. However, HTTP does not require the sender to verify > that an entity-body is in canonical form prior to transfer. Instead, > an HTTP recipient MUST be prepared to accept and properly interpret > several variances in the format of textual types, as described below, > and treat other variances as errors. Good introduction. > The "charset" parameter (Section 2.1) is used with some media types > to indicate the character encoding of the data. When a media type is > registered with a default charset value of "US-ASCII", it MAY be used > to label data transmitted via HTTP in the "iso-8859-1" charset (a > superset of US-ASCII) without including an explicit charset parameter > on the media type. In addition, when a media type registered with a > default charset value of "US-ASCII" is received via HTTP without a > charset parameter or with a charset value of "iso-8859-1", the > recipient MAY inspect the data for indications of a different > character encoding and interpret the data accordingly if the encoding > is a superset of US-ASCII or if the encoding can be determined within > the first 16 octets of data and interpreted consistently thereafter. Q: so if a text type defines a different default, such as "UTF-8", we disallow defaulting to ISO-8859-1 and sniffing? > Note: The first variance is due to a significant portion of early > HTTP user agents not parsing media type parameters and instead > relying on a then-common default encoding of iso-8859-1. As a > result, early server implementations avoided the use of charset > parameters and user agents evolved to "sniff" for new character > encodings as the Web expanded beyond iso-8859-1 content. The > second variance is due to a certain popular user agent that > employed an unsafe encoding detection and switching algorithm > within documents that might contain user-provided data (see > Section security.sniffing), the most common workaround for which > is to supply a specific charset parameter even when the actual > character encoding is unknown. Q: so specifying ISO-8859-1 was a countermeasure to broken content sniffing? Is that version of that UA still common enough to have this exception? > When in canonical form, media subtypes of the "text" type use CRLF as > the text line break. However, it is also commonplace for such types > to be transmitted in HTTP with CR or LF alone indicating a line > break and occasional for such types to be transmitted with a > character encoding that requires some other set of octet sequence(s) > to indicate a line break. HTTP recipients MUST accept and properly > interpret CRLF, bare CR, and bare LF as indicating a line break when > encountered within an entity-body received via HTTP that is labeled > as a text type and provided in a character encoding that allows CRLF > to indicate a line break. > > Note: Line breaks are specified in MIME with the expectation that > they are enforced during email message composition, when it is > scalable to ensure that every octet is placed in canonical form, > and with the anticipation that a message may be transmitted or > processed using line-oriented protocols. HTTP message generation, > in contrast, is usually performed at high speed, encloses data > that cannot be modified without also altering its metadata, and > is processed using length-delimited protocols. Ok. Sounds good to me, except for: - Q: do we need to state that we overrule RFC3023 (WRT default charset for text/xml?) - we still need the security consideration WRT sniffing. > ... BR, Julian
Received on Thursday, 14 February 2008 13:48:38 UTC