- From: Michael Day <mikeday@yeslogic.com>
- Date: Wed, 03 Jun 2009 16:12:44 +1000
- To: John Daggett <jdaggett@mozilla.com>
- CC: www-style@w3.org
Hi John, > While I appreciate that the script names in the Script data file are a > better representation of characters used for a given script, I think > using this data makes the usage of shorthand like 'Latin' *more* > complex, rather than simpler. Simply adding 'Latin' to a unicode-range > list for example would not include the space character or numbers, since > these are classified as 'Common' in the Script file. And the length and > complexity of the data in the Script file makes it difficult for authors > to determine the set of names to use for a given range. As in my message to Michel, I was thinking that specifying "Latin" would also cover any characters whose script value computed to Latin, which would include punctuation surrounding Latin text. However, this still might not be enough, depending upon the author intent. One very clear advantage of the script approach is for CJK, consider: unicode-range: Han compared with: unicode-range: U+4E00-9FFF, /* CJK Unified Ideographs */ U+F900-FAFF, /* CJK Compatibility Ideographs */ U+FE30-FE4F, /* CJK Compatibility Forms */ U+20000-2A6DF, /* CJK Unified Ideographs Extension B */ U+2F800-2FA1F /* CJK Compatibility Ideographs Supplement */ and that doesn't include the CJK symbols and punctuation block and the various CJK radical and stroke blocks. But with the uncertainty about this feature it might be best to stick with codepoint ranges, allow implementations to experiment with vendor-specific extensions for blocks and scripts and wait for feedback from authors. Best regards, Michael -- Print XML with Prince! http://www.princexml.com
Received on Wednesday, 3 June 2009 06:13:26 UTC