- From: Henri Sivonen <notifications@github.com>
- Date: Wed, 02 Sep 2015 00:57:58 -0700
- To: whatwg/encoding <encoding@noreply.github.com>
- Message-ID: <whatwg/encoding/issues/9@github.com>
The Big5 encoder first does an index lookup and then discards the code point as an error if the Big5 lead for the pointer is less than 0xA1. This makes the encoder discard code points that have two mappings: one whose Big5 lead is less than 0xA1 and another whose Big5 lead is greater or equal to 0xA1. The following code points have such double pointer mappings: 7BB8 7C06 7CCE 7DD2 7E1D 8005 8028 83C1 84A8 840F 89A6 8D77 90FD 92B9 96B6 975C 97FF 9F16 5159 515B 515D 515E 7479 6D67 799B 9097 5B28 732A 7201 77D7 7E87 99D6 91D4 60DE 6FB6 8F36 4FBB 71DF 9104 9DF0 83CF 5C10 79E3 5A67 8F0B 7B51 62D0 5605 5ED0 6062 75F9 6C4A 9B2E 50ED 62CE 60A4 7162 When testing Gecko's old Big5 encoder, at least the first of these is a non-error: http://software.hixie.ch/utilities/js/live-dom-viewer/saved/3610 (test case violates the Same Origin Policy in Blink due to different treatment of data: origins). That is, Gecko's old encoder encodes U+7BB8 as 0xBA, 0xE6. I believe that instead of checking whether lead is less than 0xA1 after the lead computation, the spec should say that when looking up pointers from the index when encoding, pointers below (0xA1 - 0x81) * 157 should be ignored, i.e. search the index from pointer (0xA1 - 0x81) * 157 onwards. --- Reply to this email directly or view it on GitHub: https://github.com/whatwg/encoding/issues/9
Received on Wednesday, 2 September 2015 07:58:28 UTC