- From: Geoffrey Sneddon <foolistbar@googlemail.com>
- Date: Tue, 12 Feb 2008 23:23:21 +0000
- To: Roy T. Fielding <fielding@gbiv.com>
- Cc: HTTP Working Group <ietf-http-wg@w3.org>, Julian Reschke <julian.reschke@gmx.de>, Mark Nottingham <mnot@mnot.net>, Robert Sayre <rsayre@mozilla.com>
On 12 Feb 2008, at 21:12, Roy T. Fielding wrote: > The answer is that iso-8859-1 is still the most interoperable default > *with* the addition of safe sniffing only when the charset is left > unlabeled or when charset="iso-8859-1". By safe sniffing, I mean > specifically excluding any charset-switching in mid-content > for which the text media type's delimiter set (e.g., <"':> in HTML) > would be mapped to different octets than they are in US-ASCII. > In other words, it is safe to sniff for charsets in the first ten > or so characters, and also to switch to other US-ASCII supersets > after reading something like the <meta http-equiv="content-type" ...>, > but it is definitely unsafe to continue sniffing for charset changes > after that point unless they are restricted to US-ASCII supersets. ISO-8859-1 isn't actually the most interoperable default: a huge number of documents (and not just HTML documents, but also a large number of feeds) rely on ISO-8859-1 being treated as Windows-1252. What is probably the best default is windows-1252 with sniffing (the exact details of which are inevitably reliant on the exact format being used, as what is used in HTML obviously isn't suitable for XML, for example). I don't think it's worthwhile attempting to define what type of sniffing (e.g., your "safe" sniffing) can be used, as it is very much context dependant (and if there's one thing we've learnt from this, let it be that context is very important), and in some cases it may be ideal to throw out what you already have. However, in defence of "safe" sniffing, HTML5 requires a partial US-ASCII superset (to sniff it from meta), and XML 1.0 implicitly requires a superset of the encoding being used in Appendix F (when there is no BOM). -- Geoffrey Sneddon <http://gsnedders.com/>
Received on Tuesday, 12 February 2008 23:23:38 UTC