- From: Philip Jägenstedt <philipj@opera.com>
- Date: Fri, 06 Apr 2012 15:42:26 +0200
On Fri, 06 Apr 2012 12:54:53 +0200, Philip J?genstedt <philipj at opera.com> wrote: > As a starting point for the spec, I suggest taking the intersection of > opera-hk, firefox-hk and chrome-hk. I've written a script in <https://gitorious.org/whatwg/big5> to generate the mapping that I think makes sense. This is the logic used: 1. If all 3 *-hk mappings agree, use that. 2. If 2 of the *-hk mappings agree on something that is not in the PUA and not U+FFFD, use that. 3. If HKSCS-2008 [1] defines a mapping, verify that at least 1 *-hk mapping agrees and use that. Finally, check that the resulting spec does not use the PUA, U+FFFD or contradicts a Big5 mapping that everybody agrees on. This yields a mapping for 18583 of 19782 combinations, which I propose as a starting point. To this I would add these 4 mappings from HKSCS-2008, which uses multiple code points to represent what was previously a single code point in the PUA in some browsers: 8862 => <U+00CA,U+0304> ?? 8864 => <U+00CA,U+030C> ?? 88A3 => <U+00EA,U+0304> ?? 88A5 => <U+00EA,U+030C> ?? Also, a single mapping fails the Big5-contraction test: F9FE => opera-hk: U+FFED ? firefox: U+2593 ? chrome: U+2593 ? firefox-hk: U+2593 ? opera: U+2593 ? chrome-hk: U+FFED ? internetexplorer: U+2593 ? hkscs-2008: <U+FFED> ? I'd say that we should go with U+FFED here, since that's what the spec says and it's visually close anyway. These are the ranges that need more investigation. 8140-817F, 81A2-81FE, 8240-827F, 82A2-82FE, 8340-837F, 83A2-83FE, 8440-847F, 84A2-84FE, 8540-857F, 85A2-85FE, 8640-867F, 86A2-86FE, 8766, 87E0-87FE, 8862, 8864, 88A3, 88A5, 88AB-88FE, 8942, 8944-8945, 894A-894B, 89A7-89AA, 89AF, 89B3-89B4, 89C0, 89C4, 8A42, 8A63, 8A75, 8AAB, 8AB1, 8ABA, 8AC8, 8ACD, 8ADD-8ADE, 8AF5, 8B54, 8BDD, 8BFE, 8CA6, 8CC6-8CC8, 8CCD, 8CE5, 8D41, 9B61, 9EAC, 9EC4, 9EF4, 9F4E, 9FAD, 9FB1, 9FC0, 9FC8, 9FDA, 9FE6, 9FEA, 9FEF, A054, A057, A05A, A062, A072, A0A5, A0AD, A0AF, A0D3, A0E1, A3E2-A3FE, C6CF, C6D3, C6D5, C6D7, C6DE-C6DF, C8A5-C8CC, C8F2-C8F4 They all map to U+FFFD in opera-hk and mostly to PUA points in other mappings. A lot of them should probably be U+FFFD, but not all of them. Is someone (Simon?) able to do a search for existing content labeled as Big5 or Big5-HKSCS that uses any of these bytes? [1] http://www.ogcio.gov.hk/en/business/tech_promotion/ccli/download_area/mapping_table_2008.htm -- Philip J?genstedt Core Developer Opera Software
Received on Friday, 6 April 2012 06:42:26 UTC