- From: Martin Duerst <duerst@w3.org>
- Date: Wed, 11 Dec 2002 06:36:28 +0900
- To: Terje Bless <link@pobox.com>, W3C Validator <www-validator@w3.org>
- Cc: Elliotte Rusty Harold <elharo@metalab.unc.edu>
At 07:54 02/12/09 +0100, Terje Bless wrote: >Elliotte Rusty Harold <elharo@metalab.unc.edu> wrote: > >I believe that in this case for XHTML, the fallback should be UTF-8. It > >certainly is for XML, and I don't think there's any reason XHTML should > >be different. If everything else fails, assume UTF-8. > >Hmmm. I must admit I'm somewht fuzzy on the details here, but IIRC that for >XML to be transported without explicit encoding information it must contain >an XML Declaration. At least, the autodetect algorithm in Appendix F of the >XML 1.0 Recommendation relies on there being either an XML Declaration or a >UNICODE Byte-Order Mark in the absense of encoding information from a >higher-level protocol (i.e. HTTP). That's not exactly correct. Appendix F lists 'everything else' as being UTF-8. But please note that the absence of a 'charset' parameter on a Content-Type header and the absence of charset information are not exactly the same. On practical terms, assuming UTF-8 has the advantage that there is a high chance that a mistake (i.e. something actually not UTF-8) is being caught; for many other encodings, that chance is much lower. >Which Content-Type the document was served/uploaded as will also affect the >character encoding determination as the different types have different >defaults and defaulting behaviour in this regard. Yes indeed. Regards, Martin.
Received on Tuesday, 10 December 2002 16:37:09 UTC