[whatwg] Web Encodings from Anne van Kesteren on 2009-08-19 (public-whatwg-archive@w3.org from August 2009)

From: Anne van Kesteren <annevk@opera.com>
Date: Wed, 19 Aug 2009 22:47:57 +0200
Message-ID: <op.uyxf17l264w2qv@annevk-t60>

Today every browser implements their own encoding label matching algorithm, supports their own list of encodings, their own list of encoding label aliases, and everything sort of works, but not really.

HTML5 solves part of this problem by defining exactly how to identify an encoding label alias in a text/html stream. It also defines which encoding label matching algorithm to use, UTS22, but we found out that this is incompatible with (existing) sites that specify EUC_JP at the HTTP level and actually want to be decoded per UTF-8 according to a <meta> in the text/html stream. This works fine if you have a strict encoding label matching algorithm, but with UTS22, EUC_JP and EUC-JP become the same thing, while only the latter is the actual encoding label.

Another problem HTML5 does not solve is giving a definitive list of encodings clients have to implement to be compatible with a large body of Web content. This means new clients will have to reverse engineer that list from existing clients which I think is bad.


-- 
Anne van Kesteren
http://annevankesteren.nl/

Received on Wednesday, 19 August 2009 13:47:57 UTC