W3C home > Mailing lists > Public > ietf-charsets@w3.org > July to September 1999

Re: Charlint (aka charlie)

From: Harald Alvestrand <Harald@Alvestrand.no>
Date: Wed, 14 Jul 1999 10:29:11 +0200
To: "Martin J. Duerst" <duerst@w3.org>
Cc: ietf-charsets@iana.org
Message-id: <4.2.0.56.19990714102244.01b86870@dokka.maxware.no>
At 09:43 14.07.99 +0900, Martin J. Duerst wrote:

> > Unicode TR 15 refers to an Unicode character database for composition.
> > Is this database changing with every new amendment to the base standard,
> > or is it a stable reference that can be compiled into programs?
> >
> > If it is unstable, how does charlint deal with it?
>
>There are various aspects:
>
>- Decompositions for existing characters are supposed not to change.
>   TR 15 significantly increases the pressure to not change.
>
>- Decompositions for newly defined precomposed characters are dealt with
>   by keeping these characters decomposed in the normalized form. If
>   charlint meets a character it doesn't know (an undefined codepoint),
>   it can warn or abort.
>
>- Support for 'compiling' is currently not part of charlint, but it
>   is an idea for future work.

I was thinking of the composition database; I thought they were different,
from the text in TR15.
There's been some pressure to add new precomposed characters (W with
circumflex? Did this get in already?) to Unicode; if these are added to the 
decomposition database, it does not hurt the correctness of normalization
of existing text that uses the decomposed forms.

If they are added to the composition database, existing normalized text
will turn "incorrect". This is bad.

(This is not really relevant to charlint, but to the underlying issues;
IETF may have to take a stand on normalization Real Soon Now....)

                          Harald

(Favourite decomposition: The Arabic "character" that decomposes into
"In the name of Allah, the compassionate and merciful" - 21
characters, including 3 spaces....)

--
Harald Tveit Alvestrand, Maxware, Norway
Harald.Alvestrand@maxware.no
Received on Wednesday, 14 July 1999 04:49:02 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 5 June 2006 15:10:51 GMT