Re: Feedback on http://www.w3.org/International/questions/qa-html-encoding-declarations-new

On 28/02/2014 15:03, Henri Sivonen wrote:
> As written, the Quick Answer is misleading if you only read that part
> and skip the Details. The Quick Answer says "If you have access to the
> server settings, you should also consider whether it makes sense to
> use the HTTP header." Instead, it should emphasize that HTTP overrides
> <meta>, so if you don't have access to the server settings and the
> server is sending a charset parameter in the Content-Type header, the
> Quick Answer won't work for you.

Good point.  I added something to that effect.

>
> The document links to http://www.w3.org/International/O-HTTP-charset
> which doesn't cover nginx configuration. nginx behavior is worth
> mentioning, since nginx configuration is a bit surprising: You have to
> use the charset directive and can't use add_header, because the latter
> appends *another* Content-Type header and, therefore, must not be used
> to attempts to refine headers that nginx already adds by other means.
>
> Back to qa-html-encoding-declarations-new:
> The document says: "Intermediate servers that transcode the data (ie.
> convert to a different encoding) sometimes take advantage of this to
> change the encoding of a document before sending it on to small
> devices that only recognize a few encodings. Because the HTTP header
> information has precedence over any in-document declaration,
> transcoders typically do not change the internal encoding
> declarations, just the document encoding and the declaration in the
> HTTP headers."
>
> Is there documented proof that that's actually true?

I will look into this further. Certainly that text is very old.


> "User agents can easily find the character encoding information when
> it is sent in the HTTP header."
>
> I suggest saying that they find it sooner. Any non-bogus user agent
> has to be able to handle the level of difficulty of finding it in
> <meta>.

Done.

> I think the section "Working with polyglot and XML formats", if
> retained at all, should go under "Obscure details you should not need
> to know".

Noted.

> Please delete "It is possible to invent your own encoding names
> preceded by x-, but this is not usually a good idea since it limits
> interoperability." It has no relevance to authoring documents that
> will be viewed in Web browsers.

I put this in there because I have several times come across people who 
wanted to do this, and I want to tell them not to.  I've reworded it as 
a strong prohibition.

>
> The section "The charset attribute on a link" fails to mention that if
> browsers supported the attribute (without special additional rules),
> it would be an XSS attack vector, which is a good reason not to
> support it.

Added.

> The document also links to
> http://www.w3.org/International/questions/qa-choosing-encodings .
> While that document correctly advises against the use of ISO-2022-*,
> HZ, etc., it fails to warn about interoperability problems between
> EUC-JP implementations on one hand and Big5 implementations on the
> other. I.e. authors are safer also avoiding EUC-JP and Big5 (including
> and especially Big5-HKSCS).


Yes, that's another article you'll see an update for soon. I plan to try 
and incorporate the information you put in another email recently.

Thanks for the comments.
RI

Received on Friday, 28 February 2014 16:27:35 UTC