W3C home > Mailing lists > Public > www-international@w3.org > October to December 2014

[Bug 27256] revamp iso-2022-jp decoder/encoder

From: <bugzilla@jessica.w3.org>
Date: Thu, 06 Nov 2014 17:40:17 +0000
To: www-international@w3.org
Message-ID: <bug-27256-4285-1gu5iaUU0Z@http.www.w3.org/Bugs/Public/>
https://www.w3.org/Bugs/Public/show_bug.cgi?id=27256

--- Comment #4 from Anne <annevk@annevk.nl> ---
It seems we do not want to follow the RFC exactly. I found numerous mismatches
between browsers and the RFC:

* Start with an ESC sequence is not an error in browsers.
* EOF in two-byte mode is not an error in browsers.
* EOF after ESC sequence is not an error in browsers.

Here is an outline of how I plan to rewrite this:

* Add Roman state
* Turn SI / SO / invalid ESC sequence (only replace ESC) into U+FFFD
* Invalid ESC sequence means switch to ASCII state
* ESC sequence after ESC sequence is invalid ESC sequence (triggers ASCII)

This is also based in part on great research from a duplicate bug:
http://upokecenter.dreamhosters.com/articles/2013/04/differences-in-the-iso-2022-jp-encoding-between-browsers/

-- 
You are receiving this mail because:
You are on the CC list for the bug.
Received on Thursday, 6 November 2014 17:40:18 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 21 September 2016 22:37:38 UTC