Re: Unknown text/* subtypes [i20] from Julian Reschke on 2008-02-12 (ietf-http-wg@w3.org from January to March 2008)

From: Julian Reschke <julian.reschke@gmx.de>
Date: Tue, 12 Feb 2008 15:03:52 +0100
To: ietf-http-wg@w3.org
CC: Mark Nottingham <mnot@mnot.net>
Message-ID: <47B1A748.2020602@gmx.de>

Hi.

With change <http://www3.tools.ietf.org/wg/httpbis/trac/changeset/209>, 
I have removed the character set defaulting, as proposed in 
<http://www3.tools.ietf.org/wg/httpbis/trac/ticket/20#comment:4>.

Section 3.1.1 (formerly 2.1.1) is gone, Section 3.3.1 (formerly 2.3.1) 
now says:

"3.3.1 Canonicalization and Text Defaults

Internet media types are registered with a canonical form. An 
entity-body transferred via HTTP messages MUST be represented in the 
appropriate canonical form prior to its transmission except for "text" 
types, as defined in the next paragraph.

When in canonical form, media subtypes of the "text" type use CRLF as 
the text line break. HTTP relaxes this requirement and allows the 
transport of text media with plain CR or LF alone representing a line 
break when it is done consistently for an entire entity-body. HTTP 
applications MUST accept CRLF, bare CR, and bare LF as being 
representative of a line break in text media received via HTTP. In 
addition, if the text is represented in a character set that does not 
use octets 13 and 10 for CR and LF respectively, as is the case for some 
multi-byte character sets, HTTP allows the use of whatever octet 
sequences are defined by that character set to represent the equivalent 
of CR and LF for line breaks. This flexibility regarding line breaks 
applies only to text media in the entity-body; a bare CR or LF MUST NOT 
be substituted for CRLF within any of the HTTP control structures (such 
as header fields and multipart boundaries).

If an entity-body is encoded with a content-coding, the underlying data 
MUST be in a form defined above prior to being encoded.

HTTP/1.1 recipients MUST respect the charset label provided by the 
sender; and those user agents that have a provision to "guess" a charset 
MUST use the charset from the content-type field if they support that 
charset, rather than the recipient's preference, when initially 
displaying a document." -- 
<http://greenbytes.de/tech/webdav/draft-ietf-httpbis-p3-payload-latest.html#canonicalization.and.text.defaults>

BTW: it the subsection title still correct?

I also added a reminder to the Security Considerations to talk about the 
implications of character set sniffing (proposals welcome), and noted 
the change in Appendix C.2:

"Remove character set defaulting for text media types. (Section 3.3.1)" 
-- 
<http://greenbytes.de/tech/webdav/draft-ietf-httpbis-p3-payload-latest.html#rfc.section.C.2>

Feedback appreciated, Julian

Received on Tuesday, 12 February 2008 14:11:53 UTC