- From: Henri Sivonen <hsivonen@hsivonen.fi>
- Date: Fri, 28 Feb 2014 17:03:45 +0200
- To: "www-international@w3.org" <www-international@w3.org>
As written, the Quick Answer is misleading if you only read that part and skip the Details. The Quick Answer says "If you have access to the server settings, you should also consider whether it makes sense to use the HTTP header." Instead, it should emphasize that HTTP overrides <meta>, so if you don't have access to the server settings and the server is sending a charset parameter in the Content-Type header, the Quick Answer won't work for you. The document links to http://www.w3.org/International/O-HTTP-charset which doesn't cover nginx configuration. nginx behavior is worth mentioning, since nginx configuration is a bit surprising: You have to use the charset directive and can't use add_header, because the latter appends *another* Content-Type header and, therefore, must not be used to attempts to refine headers that nginx already adds by other means. Back to qa-html-encoding-declarations-new: The document says: "Intermediate servers that transcode the data (ie. convert to a different encoding) sometimes take advantage of this to change the encoding of a document before sending it on to small devices that only recognize a few encodings. Because the HTTP header information has precedence over any in-document declaration, transcoders typically do not change the internal encoding declarations, just the document encoding and the declaration in the HTTP headers." Is there documented proof that that's actually true? "User agents can easily find the character encoding information when it is sent in the HTTP header." I suggest saying that they find it sooner. Any non-bogus user agent has to be able to handle the level of difficulty of finding it in <meta>. I think the section "Working with polyglot and XML formats", if retained at all, should go under "Obscure details you should not need to know". Please delete "It is possible to invent your own encoding names preceded by x-, but this is not usually a good idea since it limits interoperability." It has no relevance to authoring documents that will be viewed in Web browsers. The section "The charset attribute on a link" fails to mention that if browsers supported the attribute (without special additional rules), it would be an XSS attack vector, which is a good reason not to support it. The document also links to http://www.w3.org/International/questions/qa-choosing-encodings . While that document correctly advises against the use of ISO-2022-*, HZ, etc., it fails to warn about interoperability problems between EUC-JP implementations on one hand and Big5 implementations on the other. I.e. authors are safer also avoiding EUC-JP and Big5 (including and especially Big5-HKSCS). -- Henri Sivonen hsivonen@hsivonen.fi https://hsivonen.fi/
Received on Friday, 28 February 2014 15:04:14 UTC