- From: Anne van Kesteren <annevk@opera.com>
- Date: Thu, 29 Mar 2012 11:16:42 +0200
On Wed, 28 Mar 2012 17:40:58 +0200, Philip J?genstedt <philipj at opera.com> wrote: > Making big5 and big5-hkscs aliases sounds like a good idea, on the > assumption that big5-hkscs is a pure extension of Big5. I believe they are not, but given that a) Windows treats them identical and b) reportedly has no different default setup for Hong Kong and Taiwan users (and no longer offers a HKSCS download), they can probably be considered the same. For more details on Windows and Internet Explorer, see: http://lists.w3.org/Archives/Public/www-archive/2012Mar/thread.html#msg46 > To make this more concrete, here are a few fairly common characters that > I think are in big5-hkscs but not in big5, their unicode point and byte > representation in big5-hkscs when converted using Python: > > ? U+556B '\x94\xdc' > ? U+55F0 '\x9d\xf5' > ? U+5605 '\x9d\xef' > > I'm not sure how to use big5.json, so perhaps you can tell me what these > map to in various browsers? If they're all the same, examples of byte > sequences that don't would be interesting. big5.json is the result of outputting all possible lead/trail byte combinations and then running charCodeAt over the resulting string, while accounting for surrogates and working around a minor problem in Opera. Running the following (Python): import json data = json.loads(open("big5.json", "r").read()) lead = 0x9D trail = 0xF5 row = 0xFE-0xA1 + 0x7E-0x40 + 2 cell = (trail-0xA1 + 0x7E-0x40 +1) if trail > 0x7E else trail - 0x40 index = (lead-0x81) * row + cell for x in data: print x, hex(data[x][index]) I get opera-hk 0x55f0 firefox 0x9c1f chrome 0xecd7 firefox-hk 0x55f0 opera 0xfffd chrome-hk 0x55f0 internetexplorer 0xecd7 indicating browsers agree for big5-hkscs and not at all for big5. Similar results for your other examples. > It seems fairly obvious that the most sane solution would be to just use > a more correct mapping that doesn't involve the PUA, but: > > 1. What is the compatible subset of all browsers? > 2. Does that subset include anything mapping to the PUA? This depends on whether or not you include big5-hkscs results. Opera never maps to PUA, but whether that is compatible enough is unclear. > 3. Do Hong Kong or Taiwan sites depend on charCodeAt returning values in > the PUA? > > 4. Would hacks be needed on the font-loading side if browsers started > using a more correct mapping? Don't know. Mozilla has done a number of interesting things here nobody else does, but that was all big in '05 or earlier. https://bugzilla.mozilla.org/show_bug.cgi?id=9686 https://bugzilla.mozilla.org/show_bug.cgi?id=310299 How relevant that is today, given that they are not the market leader there, is unclear. Given the information from Microsoft indicated at the start of this email I sort of think maybe just following Internet Explorer here is the best way forward, combined with strongly discouraging the usage of big5. -- Anne van Kesteren http://annevankesteren.nl/
Received on Thursday, 29 March 2012 02:16:42 UTC