- From: Henri Sivonen <hsivonen@iki.fi>
- Date: Thu, 1 Mar 2007 11:12:28 +0200
On Mar 1, 2007, at 03:58, Ian Hickson wrote: > On Sat, 9 Apr 2005, Lachlan Hunt wrote: >> >> In the current draft, for specifying the character encoding [1], >> it is >> stated: >> >> | In XHTML, the XML declaration should be used for inline character >> | encoding information. >> | >> | Authors should avoid including inline character encoding >> information. >> | Character encoding information should instead be included at the >> | transport level (e.g. using the HTTP Content-Type header). >> >> The second paragraph should only apply to HTML using the meta >> element, >> not XHTML using the XML declaration. > > I don't understand why it would be ok for one and not the other. ... > I could see an argument for removing the advice from the HTML5 spec > altogether, though. What do you think? I think that encoding information should be included in the HTTP payload. In my opinion, the spec should not advice against this. Preferably, it would encourage putting the encoding information in the payload. (The BOM or, in the case of XML, the UTF-8 defaulting of the XML sniffing algorithm are fine.) Rationale: 1) Ruby's Postulate. 2) It just uncool that I have to add the charset meta to the WA 1.0 spec if I download it to disk and typeset it for printing using Prince which does not see the original HTTP headers. Real documents do get detached from HTTP. For application/xml and application/xhtml+xml, HTTP-level charset is harmful, because the internal info is reliable and efficiently sniffed, so the HTTP-level stuff is either redundant or wrong. For text/html, also providing HTTP-level charset makes sense, because internal encoding info sniffing is inefficient. The text/xml type is considered harmful. I think it should be a conformance requirement that the HTTP-level encoding info and the internal payload info agree if both are supplied. On Mar 1, 2007, at 09:13, Julian Reschke wrote: > If a proxy transcodes xhtml today, and does not modify the XML > declaration (when present), it will break the content, right? * A transcoding proxy that does not modify the XML declaration and tampers with application/* is broken. * Before basing advice on conjecture about transcoding proxies, it should be shown that transcoding proxies exist, are deployed (and for a good reason) and their true nature should be researched. (For now, I am treating non-reverse transcoding proxies as an urban legend.) * Distributed UAs (where the proxy and the client are more tightly coupled than in an HTTP client/proxy case, such as Opera Mini) do not count. * Russian Apache is not a trancoding proxy. It is a transcoding origin server. * Reverse proxies (e.g. http://apache.webthing.com/mod_proxy_html/) are origin servers as far as browsers are concerned. Whatever reverse proxies break is within the control of the reverse proxy operator. * When pages follow the best practice of being encoded as UTF-8, there is no legitimate reason to transcode. * Browsers have supported all the relevant Cyrillic and Japanese encodings for years, so the argumentation about Russian and Japanese transcoding proxy needs falls flat today. -- Henri Sivonen hsivonen at iki.fi http://hsivonen.iki.fi/
Received on Thursday, 1 March 2007 01:12:28 UTC