Re: [css3-writing-modes] referring to Unicode from Asmus Freytag on 2011-05-04 (www-international@w3.org from April to June 2011)

From: Asmus Freytag <asmusf@ix.netcom.com>
Date: Wed, 04 May 2011 12:13:20 -0700
To: John Daggett <jdaggett@mozilla.com>
CC: fantasai <fantasai.lists@inkedblade.net>, www-style@w3.org, WWW International <www-international@w3.org>, Addison Phillips <addison@lab126.com>
Message-ID: <4DC1A550.2040005@ix.netcom.com>

On 5/4/2011 12:18 AM, John Daggett wrote:
> fantasai wrote:
>
>> The EastAsianWidth.txt file is referenced from UAX11. UAX11 gives
>> the explanation of what it means, how to use it, etc. So I think
>> that referring to UAX11 is the correct thing to do here. I'll let
>> Addison correct me if I'm wrong.
>>
>> I really don't think it's at all ambiguous what the spec means by
>> "do this with characters classified as fullwidth (F), see UAX11".
>> You think it's ambiguous?
> It's not ambiguous, it just buries the underlying reference.  Unless immediately
> obvious, we should be defining CSS properties with respect to specific
> properties in the Unicode database and consistently referring to the location
> of that database, including other references that explain the handling of
> those properties in more detail.
>
> For example:
>
>      The East_Asian_Width property of the Unicode database [UAX44] can
>      be used to ... (see [UAX11] for more background on this property).
>
> Regards,
>
>
UAX#11 EAW is a good example of why it's important not to bypass the 
documentation for Unicode Property data. The UAX makes clear that there 
are two levels of character classification, one of which takes into 
account context other than character properties.

For many applications, the main issue is whether some characters behave 
like "wide" characters (i.e. similar to ideographic characters) or not. 
UAX#11 gives a prescription how to make this determination based on a 
number of basic character properties. However, one important class of 
characters in the data are "A" (for ambiguous).

According to UAX#11, the intent for these characters is to use context 
to determine whether they need to be handled like ideographs or like 
"regular" characters - without applying this resolution step, characters 
of class "A" cannot be handled correctly.

On a system that maps these characters to legacy character sets and uses 
legacy fonts, they would be displayed as "wide" characters, while, on 
systems that no longer follow these legacy practices some or all of 
these characters would be displayed as ordinary (aka narrow characters). 
Therefore, to handle these characters, one must know whether the 
environment treats (all or some of) them as a legacy system would.

This contingent quality of the classification is something that's not 
apparent from the raw values in the database, and rises above mere 
"background" information.

A./

PS: the same ambiguity in character handling carries through to UAX#14, 
based on the same issue: legacy and non-legacy systems, fonts, etc. 
differ fundamentally in how they represent certain characters, hence it 
is necessary to supply context.

Received on Wednesday, 4 May 2011 19:14:01 UTC