I18N-ISSUE-376 (BUG21057): [survey needed] create a replacement encoding [encoding]

I18N-ISSUE-376 (BUG21057): [survey needed] create a replacement encoding [encoding]


Raised by: Addison Phillips
On product: encoding


This issue tracks the bug listed above and was created as part of the WG LC process. The bug was created prior to the WG LC.


Problem statement:

1) The Encoding Standard removes the ISO-2022-CN encoding. This will make sites that rely on that encoding being supported vulnerable to XSS the way Yahoo search was vulnerable in Chrome when Chrome removed ISO-2022-KR. See https://code.google.com/p/chromium/issues/detail?id=15701

2) There exist ASCII-incompatible encodings in the world outside the Encoding Standard and support for those encodings might be exposed if server-side libraries. Sites that are naïve enough to allow the user to specify the output encoding that the site uses and this past the user-supplied encoding name to server-side library without white listing ASCII-compatible encodings are vulnerable to EBCDIC attacks: An attacker can request that the site use an EBCDIC-based encoding and the site responds with EBCDIC which isn't recognized by non-IE browsers and browsers fall back on an ASCII-compatible encoding resulting in the EBCDIC bytes being interpreted in a dangerous way. See http://zaynar.co.uk/docs/charset-encoding-xss.html for a reference to an actual search engine that was vulnerable to this attack.

Proposed solution:
Define a replacement encoding that decodes all possible byte values to the REPLACEMENT CHARACTER. Make the known labels for ASCII-incompatible encodings that exist but aren't part of the Encoding Standard aliases for the replacement encoding.

Additional info:
This solution would pave the way for safe removal of ISO-2022-KR and hz-gb-2312 from the set of encodings supported by the Encoding Standard.

Received on Thursday, 10 July 2014 04:13:06 UTC