W3C home > Mailing lists > Public > ietf-http-wg@w3.org > January to March 2008

Re: Unknown text/* subtypes

From: Julian Reschke <julian.reschke@gmx.de>
Date: Fri, 04 Jan 2008 17:33:36 +0100
Message-ID: <477E5FE0.3080804@gmx.de>
To: Martin Duerst <duerst@it.aoyama.ac.jp>
CC: Frank Ellermann <hmdmhdfmhdjmzdtjmzdtzktdkztdjz@gmail.com>, ietf-types@alvestrand.no, ietf-http-wg@w3.org

Martin Duerst wrote:
> At 00:45 07/12/19, Frank Ellermann wrote:
>> Julian Reschke wrote:
>>
>>> there's also RFC2616
>> Yes, that's an ugly legacy exception...  
>>
>>> <http://www.w3.org/Protocols/HTTP/1.1/rfc2616bis/issues/#i20>
>> ...maybe 2616bis can drop this oddity in favour of a simple
>> "unknown text is ASCII" rule.
> 
> The new version of the HTTP spec, 2616bis, should definitely
> drop the iso-8859-1 default, but NOT in favor of "unknown text is ASCII".
> It should just say that there is no default.

As far as I understand, we currently have RFC2046, RFC2616 and RFC3023 
making conflicting requirements:

RFC2046: the default for text/* is US-ASCII 
(<http://tools.ietf.org/html/rfc2046#section-4.1.2>)

RFC2616: the default for text/* received over HTTP is ISO8859-1 
(<http://tools.ietf.org/html/rfc2616#section-3.7.1>)

RFC3023: the default for text/xml is US-ASCII, even when received over 
HTTP (<http://tools.ietf.org/html/rfc3023#section-3.1>)

This is a mess, and as far as I can tell, it would be good if at least 
HTTP would get out of it.

So it seems that we need to decide on two separate questions:

1) Do we want HTTP to override RFC2046's defaults at all?

2) If we do want to continue that, what do we want to mandate?

Right now, browsers (just tested Opera/Safari/Mozilla/IE7) ignore all 
three RFCs for at least text/xml (they all look at the content).

If our answer to 1) is "no", the content will still be broken, but at 
least it's not HTTP's fault anymore.

Otherwise we can state "in absence of charset parameter recipient MAY do 
charset sniffing (BOM, XML decl, HTML meta tag, ...), which would 
probably match what's actually implemented.

> There is a big difference between these two, especially
> for document formats that contain internal 'charset' information.
> A default of US-ASCII makes document-internal 'charset' information
> useless (because the external information wins). No default means
> that the recipient will look at the internal information.

Yep.

>> HTTP oddities shouldn't affect
>> MIME registrations, there's no string "2616" in BCP13.
> 
> One reason for the problems with text/xml was that the
> original MIME default of US-ASCII was enforced. This made
> it impossible to serve XML documents with internal 'charset'
> information only as text/xml.

Hmm. So why did RFC3023 then mandate US-ASCII again? In general, is it 
acceptable *at all* to override charset defaults defined in RFC2046?

BR, Julian
Received on Friday, 4 January 2008 16:33:59 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Friday, 27 April 2012 06:50:36 GMT