Re: [draft] Unicode Normalization: requsts for CSS-WG, HTML-CG agendum from Mark Davis on 2009-02-10 (public-i18n-core@w3.org from January to March 2009)

From: Mark Davis <mark.davis@icu-project.org>
Date: Tue, 10 Feb 2009 13:01:32 -0800
To: fantasai <fantasai.lists@inkedblade.net>
Cc: Martin Duerst <duerst@it.aoyama.ac.jp>, "Phillips, Addison" <addison@amazon.com>, "public-i18n-core@w3.org" <public-i18n-core@w3.org>
Message-ID: <30b660a20902101301j3e20a168gede34ec49e48dc9d@mail.gmail.com>

The CJK compatibility characters are also variants of the corresponding
'ordinary' character in that either character could appear in either form.
As a matter of fact, the glyphic shape of the sources (eg JIS) has changed
over time.

The Unicode Consortium does recognize that particular glyphic shapes are
sometimes important, and has developed a much more comprehensive mechanism
to deal with it. See http://unicode.org/reports/tr37/

Mark

On Mon, Feb 9, 2009 at 16:16, fantasai <fantasai.lists@inkedblade.net>wrote:

>
> Martin Duerst wrote:
>
>> I haven't read everything, but if your claim ("overly-aggressive")
>> is true, then early normalization would be better than late matching,
>> because it would allow those producers that, for whatever reason,
>> insist on that there is a difference to simply not do normalization
>> for these codepoints.
>>
>
> The argument is that certain normalization mappings in NFC/NFD
> are more like the types of mappings that happen in NFKC/NFKD than
> like the compose/decompose/ordering mappings. Therefore early
> normalization would cause dataloss in the content, whereas late
> matching at, e.g. the selectors level, would avoid such dataloss
> while still allowing such strings to match.
>
> See Ambrose Li's and Robert Burns's comments:
> http://lists.w3.org/Archives/Public/www-style/2009Feb/0229.html
>
> ~fantasai
>
>
>

Received on Tuesday, 10 February 2009 21:02:14 UTC