RE: [css3-writing-modes] referring to Unicode from Koji Ishii on 2011-05-08 (www-style@w3.org from May 2011)

From: Koji Ishii <kojiishi@gluesoft.co.jp>
Date: Sun, 8 May 2011 13:23:58 -0400
To: Asmus Freytag <asmusf@ix.netcom.com>, John Daggett <jdaggett@mozilla.com>
CC: fantasai <fantasai.lists@inkedblade.net>, "www-style@w3.org" <www-style@w3.org>, WWW International <www-international@w3.org>, Addison Phillips <addison@lab126.com>
Message-ID: <A592E245B36A8949BDB0A302B375FB4E0AC2875701@MAILR001.mail.lan>

I re-read East_Asian_Width part of UAX44 to understand suggested wordings better.

I'm fine with wordings John suggested (replaced "backgrounds" with "details" given Asmus's suggestion):
     The East_Asian_Width property of the Unicode database [UAX44] can
     be used to ... (see [UAX11] for more details on this property).

But if you read UAX44, it merely says:
  See Unicode Standard Annex #11, "East Asian Width" [UAX11] and
  DerivedEastAsianWidth.txt for more details.
And
  Property values are described in Unicode Standard Annex #11,
  "East Asian Width" [UAX11].

So, I suppose implementers have to read UAX11 anyway, and the link to the data file is in 6.3 Data File section in UAX11[1]. I couldn't find the link in UAX44.

Either works for me, but this makes me feel that referring to UAX44 for East_Asian_width property adds one unnecessary route to the data file.

I hope we're okay to use UAX11 directly, as I think it makes everyone's life a little easier.

[1] http://unicode.org/reports/tr11/#DataFile

Regards,
Koji

-----Original Message-----
From: www-international-request@w3.org [mailto:www-international-request@w3.org] On Behalf Of Asmus Freytag
Sent: Thursday, May 05, 2011 4:13 AM
To: John Daggett
Cc: fantasai; www-style@w3.org; WWW International; Addison Phillips
Subject: Re: [css3-writing-modes] referring to Unicode

On 5/4/2011 12:18 AM, John Daggett wrote:
> fantasai wrote:
>
>> The EastAsianWidth.txt file is referenced from UAX11. UAX11 gives
>> the explanation of what it means, how to use it, etc. So I think
>> that referring to UAX11 is the correct thing to do here. I'll let
>> Addison correct me if I'm wrong.
>>
>> I really don't think it's at all ambiguous what the spec means by
>> "do this with characters classified as fullwidth (F), see UAX11".
>> You think it's ambiguous?
> It's not ambiguous, it just buries the underlying reference.  Unless immediately
> obvious, we should be defining CSS properties with respect to specific
> properties in the Unicode database and consistently referring to the location
> of that database, including other references that explain the handling of
> those properties in more detail.
>
> For example:
>
>      The East_Asian_Width property of the Unicode database [UAX44] can
>      be used to ... (see [UAX11] for more background on this property).
>
> Regards,
>
>
UAX#11 EAW is a good example of why it's important not to bypass the 
documentation for Unicode Property data. The UAX makes clear that there 
are two levels of character classification, one of which takes into 
account context other than character properties.

For many applications, the main issue is whether some characters behave 
like "wide" characters (i.e. similar to ideographic characters) or not. 
UAX#11 gives a prescription how to make this determination based on a 
number of basic character properties. However, one important class of 
characters in the data are "A" (for ambiguous).

According to UAX#11, the intent for these characters is to use context 
to determine whether they need to be handled like ideographs or like 
"regular" characters - without applying this resolution step, characters 
of class "A" cannot be handled correctly.

On a system that maps these characters to legacy character sets and uses 
legacy fonts, they would be displayed as "wide" characters, while, on 
systems that no longer follow these legacy practices some or all of 
these characters would be displayed as ordinary (aka narrow characters). 
Therefore, to handle these characters, one must know whether the 
environment treats (all or some of) them as a legacy system would.

This contingent quality of the classification is something that's not 
apparent from the raw values in the database, and rises above mere 
"background" information.

A./

PS: the same ambiguity in character handling carries through to UAX#14, 
based on the same issue: legacy and non-legacy systems, fonts, etc. 
differ fundamentally in how they represent certain characters, hence it 
is necessary to supply context.

Received on Sunday, 8 May 2011 17:24:07 UTC