- From: Asmus Freytag <asmusf@ix.netcom.com>
- Date: Wed, 04 May 2011 12:13:20 -0700
- To: John Daggett <jdaggett@mozilla.com>
- CC: fantasai <fantasai.lists@inkedblade.net>, www-style@w3.org, WWW International <www-international@w3.org>, Addison Phillips <addison@lab126.com>
On 5/4/2011 12:18 AM, John Daggett wrote: > fantasai wrote: > >> The EastAsianWidth.txt file is referenced from UAX11. UAX11 gives >> the explanation of what it means, how to use it, etc. So I think >> that referring to UAX11 is the correct thing to do here. I'll let >> Addison correct me if I'm wrong. >> >> I really don't think it's at all ambiguous what the spec means by >> "do this with characters classified as fullwidth (F), see UAX11". >> You think it's ambiguous? > It's not ambiguous, it just buries the underlying reference. Unless immediately > obvious, we should be defining CSS properties with respect to specific > properties in the Unicode database and consistently referring to the location > of that database, including other references that explain the handling of > those properties in more detail. > > For example: > > The East_Asian_Width property of the Unicode database [UAX44] can > be used to ... (see [UAX11] for more background on this property). > > Regards, > > UAX#11 EAW is a good example of why it's important not to bypass the documentation for Unicode Property data. The UAX makes clear that there are two levels of character classification, one of which takes into account context other than character properties. For many applications, the main issue is whether some characters behave like "wide" characters (i.e. similar to ideographic characters) or not. UAX#11 gives a prescription how to make this determination based on a number of basic character properties. However, one important class of characters in the data are "A" (for ambiguous). According to UAX#11, the intent for these characters is to use context to determine whether they need to be handled like ideographs or like "regular" characters - without applying this resolution step, characters of class "A" cannot be handled correctly. On a system that maps these characters to legacy character sets and uses legacy fonts, they would be displayed as "wide" characters, while, on systems that no longer follow these legacy practices some or all of these characters would be displayed as ordinary (aka narrow characters). Therefore, to handle these characters, one must know whether the environment treats (all or some of) them as a legacy system would. This contingent quality of the classification is something that's not apparent from the raw values in the database, and rises above mere "background" information. A./ PS: the same ambiguity in character handling carries through to UAX#14, based on the same issue: legacy and non-legacy systems, fonts, etc. differ fundamentally in how they represent certain characters, hence it is necessary to supply context.
Received on Wednesday, 4 May 2011 19:14:01 UTC