[encoding] Big5 encoder treats a code point as error when both an HKSCS and non-HKSCS pointer exists for the code point (#9) from Henri Sivonen on 2015-09-02 (public-webapps-github@w3.org from September 2015)

From: Henri Sivonen <notifications@github.com>
Date: Wed, 02 Sep 2015 00:57:58 -0700
To: whatwg/encoding <encoding@noreply.github.com>
Message-ID: <whatwg/encoding/issues/9@github.com>

The Big5 encoder first does an index lookup and then discards the code point as an error if the Big5 lead for the pointer is less than 0xA1. This makes the encoder discard code points that have two mappings: one whose Big5 lead is less than 0xA1 and another whose Big5 lead is greater or equal to 0xA1.

The following code points have such double pointer mappings:
7BB8
7C06
7CCE
7DD2
7E1D
8005
8028
83C1
84A8
840F
89A6
8D77
90FD
92B9
96B6
975C
97FF
9F16
5159
515B
515D
515E
7479
6D67
799B
9097
5B28
732A
7201
77D7
7E87
99D6
91D4
60DE
6FB6
8F36
4FBB
71DF
9104
9DF0
83CF
5C10
79E3
5A67
8F0B
7B51
62D0
5605
5ED0
6062
75F9
6C4A
9B2E
50ED
62CE
60A4
7162

When testing Gecko's old Big5 encoder, at least the first of these is a non-error: http://software.hixie.ch/utilities/js/live-dom-viewer/saved/3610 (test case violates the Same Origin Policy in Blink due to different treatment of data: origins). That is, Gecko's old encoder encodes U+7BB8 as 0xBA, 0xE6.

I believe that instead of checking whether lead is less than 0xA1 after the lead computation, the spec should say that when looking up pointers from the index when encoding, pointers below (0xA1 - 0x81) * 157 should be ignored, i.e. search the index from pointer (0xA1 - 0x81) * 157 onwards.


---
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/encoding/issues/9

Received on Wednesday, 2 September 2015 07:58:28 UTC