W3C home > Mailing lists > Public > ietf-http-wg@w3.org > January to March 2008

Re: Unknown text/* subtypes

From: Geoffrey Sneddon <foolistbar@googlemail.com>
Date: Thu, 10 Jan 2008 16:26:14 +0000
Cc: HTTP Working Group <ietf-http-wg@w3.org>
Message-Id: <FDDE0671-DF8C-44BD-8F7D-6B0F94043DD3@googlemail.com>
To: Julian Reschke <julian.reschke@gmx.de>


On 4 Jan 2008, at 16:33, Julian Reschke wrote:

> RFC2616: the default for text/* received over HTTP is ISO8859-1 (<http://tools.ietf.org/html/rfc2616#section-3.7.1 
> >)

Of note is the fact that real implementations tend to use Windows-1252  
(a superset of ISO-8859-1's graphical characters) in place of  
ISO-8859-1, as content relies upon this (though they mostly try some  
sort of encoding sniffing first).

> Otherwise we can state "in absence of charset parameter recipient  
> MAY do charset sniffing (BOM, XML decl, HTML meta tag, ...), which  
> would probably match what's actually implemented.


It isn't "in absence of a charset parameter" though, at least in the  
case text/xml (I've come across feeds served as "text/ 
xml;charset=ANSI" that have a charset of GB2312  I strongly doubt  
that text/xml is the only MIME type to be affected like this).

--
Geoffrey Sneddon
<http://gsnedders.com/>
Received on Thursday, 10 January 2008 16:26:34 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Friday, 27 April 2012 06:50:36 GMT