- From: Mark Nottingham <mnot@mnot.net>
- Date: Tue, 11 Mar 2008 16:50:45 +1100
- To: Roy T. Fielding <fielding@gbiv.com>
- Cc: HTTP Working Group <ietf-http-wg@w3.org>
My .02 - Overall, I'm not intensely happy with this, but it does seem like the most practical way forward. My biggest concern is that it places some fairly wide-reaching MUST- level requirements. Would downgrading them to SHOULD be workable? Can we refine their targets, e.g., instead of applying them to all recipients, target them at user-agents (and maybe origin servers too) as recipients? We're also still needing the security considerations text WRT UTF-7, correct? A few more editorial comments inline - On 14/02/2008, at 2:39 PM, Roy T. Fielding wrote: > > 2.3.1. Canonicalization and Text Media Types > > Internet media types are registered with a canonical form and > defaults for the optional parameter values. An ideal HTTP > entity-body would contain data formatted strictly according to that > canonical form. However, HTTP does not require the sender to verify > that an entity-body is in canonical form prior to transfer. > Instead, > an HTTP recipient MUST be prepared to accept and properly interpret > several variances in the format of textual types, as described > below, > and treat other variances as errors. This is a MUST, but the requirements about encoding below are MAYs, which is a bit odd... > The "charset" parameter (Section 2.1) is used with some media types > to indicate the character encoding of the data. When a media type > is > registered with a default charset value of "US-ASCII", it MAY be > used > to label data transmitted via HTTP in the "iso-8859-1" charset (a > superset of US-ASCII) without including an explicit charset > parameter > on the media type. This sentence doesn't read well; what is 'it'? 'label' is also not quite right, suggest 'indicate'. Also, who does the MAY apply to? > In addition, when a media type registered with a > default charset value of "US-ASCII" is received via HTTP without a > charset parameter or with a charset value of "iso-8859-1", the > recipient MAY inspect the data for indications of a different > character encoding and interpret the data accordingly if the > encoding > is a superset of US-ASCII or if the encoding can be determined > within > the first 16 octets of data and interpreted consistently thereafter. This sentence is also very difficult. It may help to insert another MAY in between 'and' and 'interpret'. > Note: The first variance is due to a significant portion of early > HTTP user agents not parsing media type parameters and instead > relying on a then-common default encoding of iso-8859-1. As a > result, early server implementations avoided the use of charset > parameters and user agents evolved to "sniff" for new character > encodings as the Web expanded beyond iso-8859-1 content. The > second variance is due to a certain popular user agent that > employed an unsafe encoding detection and switching algorithm > within documents that might contain user-provided data (see > Section security.sniffing), the most common workaround for which > is to supply a specific charset parameter even when the actual > character encoding is unknown. > > When in canonical form, media subtypes of the "text" type use CRLF > as > the text line break. However, it is also commonplace for such types > to be transmitted in HTTP with CR or LF alone indicating a line > break and occasional for such types to be transmitted with a > character encoding that requires some other set of octet sequence(s) > to indicate a line break. HTTP recipients MUST accept and properly > interpret CRLF, bare CR, and bare LF as indicating a line break when > encountered within an entity-body received via HTTP that is labeled > as a text type and provided in a character encoding that allows CRLF > to indicate a line break. > > Note: Line breaks are specified in MIME with the expectation that > they are enforced during email message composition, when it is > scalable to ensure that every octet is placed in canonical form, > and with the anticipation that a message may be transmitted or > processed using line-oriented protocols. HTTP message > generation, > in contrast, is usually performed at high speed, encloses data > that cannot be modified without also altering its metadata, and > is processed using length-delimited protocols. -- Mark Nottingham http://www.mnot.net/
Received on Tuesday, 11 March 2008 05:51:08 UTC