- From: John Daggett <jdaggett@mozilla.com>
- Date: Tue, 2 Jun 2009 22:44:21 -0700 (PDT)
- To: www-style@w3.org
>> I think that either using hex ranges (as originally designed) or >> language script values (such as Latin, Greek, Cyrillic, Arabic, >> etc... as defined by Unicode UAX#24) is a better approach. And even >> with that, it requires some skills to create a font content that >> adequately cover writing systems, because of the shared characters >> (typically classified as 'Common' or 'Inherited' in term of script >> values). > > Using script values sounds like a great idea! That's a lot easier for > the user than specifying a dozen Unicode blocks, and it also handles > the common and inherited characters, which unicode-range currently > can't do. > > (It's also nice from an efficiency point of view, as we already have > perform script processing in order to correctly apply OpenType > features, so it is no extra work at all). > > So, how about allowing unicode-range to accept Unicode script names? > And should these be strings or keywords? :) While I appreciate that the script names in the Script data file are a better representation of characters used for a given script, I think using this data makes the usage of shorthand like 'Latin' *more* complex, rather than simpler. Simply adding 'Latin' to a unicode-range list for example would not include the space character or numbers, since these are classified as 'Common' in the Script file. And the length and complexity of the data in the Script file makes it difficult for authors to determine the set of names to use for a given range. Given that we're considering shorthand for a set of Unicode ranges, I think using the Block names is a better approach. The data in the Scripts file would definitely serve as a good guide to authors constructing range lists to carefully distinguish the font used for different scripts. Cheers, John Simple example for Telegu: unicode-range: Telegu; Telegu in Blocks.txt: 0C00..0C7F; Telugu Telegu in Scripts.txt: 0C01..0C03 ; Telugu # Mc [3] TELUGU SIGN CANDRABINDU..TELUGU SIGN VISARGA 0C05..0C0C ; Telugu # Lo [8] TELUGU LETTER A..TELUGU LETTER VOCALIC L 0C0E..0C10 ; Telugu # Lo [3] TELUGU LETTER E..TELUGU LETTER AI 0C12..0C28 ; Telugu # Lo [23] TELUGU LETTER O..TELUGU LETTER NA 0C2A..0C33 ; Telugu # Lo [10] TELUGU LETTER PA..TELUGU LETTER LLA 0C35..0C39 ; Telugu # Lo [5] TELUGU LETTER VA..TELUGU LETTER HA 0C3D ; Telugu # Lo TELUGU SIGN AVAGRAHA 0C3E..0C40 ; Telugu # Mn [3] TELUGU VOWEL SIGN AA..TELUGU VOWEL SIGN II 0C41..0C44 ; Telugu # Mc [4] TELUGU VOWEL SIGN U..TELUGU VOWEL SIGN VOCALIC RR 0C46..0C48 ; Telugu # Mn [3] TELUGU VOWEL SIGN E..TELUGU VOWEL SIGN AI 0C4A..0C4D ; Telugu # Mn [4] TELUGU VOWEL SIGN O..TELUGU SIGN VIRAMA 0C55..0C56 ; Telugu # Mn [2] TELUGU LENGTH MARK..TELUGU AI LENGTH MARK 0C58..0C59 ; Telugu # Lo [2] TELUGU LETTER TSA..TELUGU LETTER DZA 0C60..0C61 ; Telugu # Lo [2] TELUGU LETTER VOCALIC RR..TELUGU LETTER VOCALIC LL 0C62..0C63 ; Telugu # Mn [2] TELUGU VOWEL SIGN VOCALIC L..TELUGU VOWEL SIGN VOCALIC LL 0C66..0C6F ; Telugu # Nd [10] TELUGU DIGIT ZERO..TELUGU DIGIT NINE 0C78..0C7E ; Telugu # No [7] TELUGU FRACTION DIGIT ZERO FOR ODD POWERS OF FOUR..TELUGU FRACTION DIGIT THREE FOR EVEN POWERS OF FOUR 0C7F ; Telugu # So TELUGU SIGN TUUMU
Received on Wednesday, 3 June 2009 05:45:00 UTC