Re: Unknown text/* subtypes

On 4 Jan 2008, at 16:33, Julian Reschke wrote:

> RFC2616: the default for text/* received over HTTP is ISO8859-1 (<http://tools.ietf.org/html/rfc2616#section-3.7.1 
> >)

Of note is the fact that real implementations tend to use Windows-1252  
(a superset of ISO-8859-1's graphical characters) in place of  
ISO-8859-1, as content relies upon this (though they mostly try some  
sort of encoding sniffing first).

> Otherwise we can state "in absence of charset parameter recipient  
> MAY do charset sniffing (BOM, XML decl, HTML meta tag, ...), which  
> would probably match what's actually implemented.


It isn't "in absence of a charset parameter" though, at least in the  
case text/xml (I've come across feeds served as "text/ 
xml;charset=ANSI" that have a charset of GB2312 — I strongly doubt  
that text/xml is the only MIME type to be affected like this).

--
Geoffrey Sneddon
<http://gsnedders.com/>

Received on Thursday, 10 January 2008 16:26:34 UTC