- From: Koji Ishii <kojiishi@gluesoft.co.jp>
- Date: Sun, 8 May 2011 13:23:58 -0400
- To: Asmus Freytag <asmusf@ix.netcom.com>, John Daggett <jdaggett@mozilla.com>
- CC: fantasai <fantasai.lists@inkedblade.net>, "www-style@w3.org" <www-style@w3.org>, WWW International <www-international@w3.org>, Addison Phillips <addison@lab126.com>
I re-read East_Asian_Width part of UAX44 to understand suggested wordings better. I'm fine with wordings John suggested (replaced "backgrounds" with "details" given Asmus's suggestion): The East_Asian_Width property of the Unicode database [UAX44] can be used to ... (see [UAX11] for more details on this property). But if you read UAX44, it merely says: See Unicode Standard Annex #11, "East Asian Width" [UAX11] and DerivedEastAsianWidth.txt for more details. And Property values are described in Unicode Standard Annex #11, "East Asian Width" [UAX11]. So, I suppose implementers have to read UAX11 anyway, and the link to the data file is in 6.3 Data File section in UAX11[1]. I couldn't find the link in UAX44. Either works for me, but this makes me feel that referring to UAX44 for East_Asian_width property adds one unnecessary route to the data file. I hope we're okay to use UAX11 directly, as I think it makes everyone's life a little easier. [1] http://unicode.org/reports/tr11/#DataFile Regards, Koji -----Original Message----- From: www-international-request@w3.org [mailto:www-international-request@w3.org] On Behalf Of Asmus Freytag Sent: Thursday, May 05, 2011 4:13 AM To: John Daggett Cc: fantasai; www-style@w3.org; WWW International; Addison Phillips Subject: Re: [css3-writing-modes] referring to Unicode On 5/4/2011 12:18 AM, John Daggett wrote: > fantasai wrote: > >> The EastAsianWidth.txt file is referenced from UAX11. UAX11 gives >> the explanation of what it means, how to use it, etc. So I think >> that referring to UAX11 is the correct thing to do here. I'll let >> Addison correct me if I'm wrong. >> >> I really don't think it's at all ambiguous what the spec means by >> "do this with characters classified as fullwidth (F), see UAX11". >> You think it's ambiguous? > It's not ambiguous, it just buries the underlying reference. Unless immediately > obvious, we should be defining CSS properties with respect to specific > properties in the Unicode database and consistently referring to the location > of that database, including other references that explain the handling of > those properties in more detail. > > For example: > > The East_Asian_Width property of the Unicode database [UAX44] can > be used to ... (see [UAX11] for more background on this property). > > Regards, > > UAX#11 EAW is a good example of why it's important not to bypass the documentation for Unicode Property data. The UAX makes clear that there are two levels of character classification, one of which takes into account context other than character properties. For many applications, the main issue is whether some characters behave like "wide" characters (i.e. similar to ideographic characters) or not. UAX#11 gives a prescription how to make this determination based on a number of basic character properties. However, one important class of characters in the data are "A" (for ambiguous). According to UAX#11, the intent for these characters is to use context to determine whether they need to be handled like ideographs or like "regular" characters - without applying this resolution step, characters of class "A" cannot be handled correctly. On a system that maps these characters to legacy character sets and uses legacy fonts, they would be displayed as "wide" characters, while, on systems that no longer follow these legacy practices some or all of these characters would be displayed as ordinary (aka narrow characters). Therefore, to handle these characters, one must know whether the environment treats (all or some of) them as a legacy system would. This contingent quality of the classification is something that's not apparent from the raw values in the database, and rises above mere "background" information. A./ PS: the same ambiguity in character handling carries through to UAX#14, based on the same issue: legacy and non-legacy systems, fonts, etc. differ fundamentally in how they represent certain characters, hence it is necessary to supply context.
Received on Sunday, 8 May 2011 17:27:04 UTC