W3C home > Mailing lists > Public > www-archive@w3.org > March 2012

RE: big5 and big5-hkscs

From: Shawn Steele <Shawn.Steele@microsoft.com>
Date: Wed, 28 Mar 2012 16:36:19 +0000
To: Anne van Kesteren <annevk@opera.com>
CC: "www-archive@w3.org" <www-archive@w3.org>
Message-ID: <E14011F8737B524BB564B05FF748464A5B1B5CCB@TK5EX14MBXC139.redmond.corp.microsoft.com>
PUA == "Private Use Area", so people can show whatever glyphs they want for whatever PUA code point they want.  It's more like per-font or something than per-locale.  Different documents could use different fonts to show different things.

We map those to the Unicode PUA, there's no better Unicode code point.

FWIW: We have a mechanism where we allow "EUDC" characters to be mapped.  The net result is that people can cause a specific font, of their own creation, to be used as the fallback for the system for those unknown PUA characters.  For a web site, that'd mean that if they wanted to use the PUA, they'd either have to use a common convention, or provide a font.  In either case I'd strongly recommend that the web site developer used Unicode as, particularly in these edge cases, the differences between implementation make it really hard to be cross-platform.

-Shawn

-----Original Message-----
From: Anne van Kesteren [mailto:annevk@opera.com] 
Sent: Wednesday, March 28, 2012 9:19 AM
To: Shawn Steele
Cc: www-archive@w3.org
Subject: big5 and big5-hkscs

Hi Shawn,

I was hoping you could clear something up for me regarding big5. As far as I can tell Internet Explorer treats big5 and big5-hkscs the same. When generating all the possible multi-byte sequences (0x81 to 0xFE as lead,
0x40 to 0x7E and 0xA1 to 0xFE as trail) I get 19782 code points of which
6217 are in PUA in Internet Explorer. Is it still the case (as suggested by http://www.microsoft.com/hk/hkscs/ and elsewhere) that these map to different glyphs depending on the user's locale?

Other questions you could really help me with:

1. If they do indeed map differently, is there a way to get more information as to how they map differently?
2. Is there information available what the best Unicode code points for these PUA code points are?

I did not email this to the charset list as it seemed off-topic. I did however cc www-archive as this information might be relevant to other people. Hope that's okay.

Kind regards,


--
Anne van Kesteren
http://annevankesteren.nl/


Received on Wednesday, 28 March 2012 16:36:59 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 7 November 2012 14:18:48 GMT