- From: Anne van Kesteren <annevk@opera.com>
- Date: Tue, 20 Dec 2011 12:01:15 +0100
Hi, When doing research into encodings as implemented by popular user agents I have found the current standards lacking. In particular: * More encodings in the registry than needed for the web * Error handling for encodings is undefined (can lead to XSS exploits, also gives interoperability problems) * Often encodings are implemented differently from the standard A year ago I did some research into encodings[1] and more detailed for single-octet encodings[2] and I have now taken that further into starting to define a standard[3] for encodings as they are to be implemented by user agents. The current scope is roughly defining the encodings, their labels and name, and how you match a label. The goal is to unify encoding handling across user agents for the web so legacy pages can be interpreted "correctly" (i.e. as expected by users). If you are interested in helping out testing (and reverse engineering) multi-octet encodings please let me know. Any other input is much appreciated as well. (I emailed this separately to ietf-charsets.) Kind regards, [1]<http://wiki.whatwg.org/wiki/Web_Encodings> [2]<http://annevankesteren.nl/2010/12/encodings-labels-tested> [3]<http://dvcs.w3.org/hg/encoding/raw-file/tip/Overview.html> -- Anne van Kesteren http://annevankesteren.nl/
Received on Tuesday, 20 December 2011 03:01:15 UTC