W3C home > Mailing lists > Public > ietf-charsets@w3.org > July to September 1999

Re: Charlint (aka charlie)

From: Martin J. Duerst <duerst@w3.org>
Date: Fri, 16 Jul 1999 09:02:01 +0900
To: Harald Alvestrand <Harald@Alvestrand.no>
Cc: ietf-charsets@iana.org
Message-id: <199907160000.JAA17415@mail.sfc.keio.ac.jp>
At 10:29 99/07/14 +0200, Harald Alvestrand wrote:

> I was thinking of the composition database; I thought they were different,
> from the text in TR15.
> There's been some pressure to add new precomposed characters (W with
> circumflex? Did this get in already?) to Unicode; if these are added to the 
> decomposition database, it does not hurt the correctness of normalization
> of existing text that uses the decomposed forms.
> 
> If they are added to the composition database, existing normalized text
> will turn "incorrect". This is bad.

They will only be added to the 'decomposition database'. Note however
that the form of the data is not 'decomposition database' and 'composition
database', but it is 'database' and 'composition exclusions'. Your
'decomposition database' is obtained directly from the 'database',
the 'composition database' is the 'database' minus the 'exclusions'.
The decomposition for a newly defined precomposed character after
Unicode V 3.0 will be automatically put both into the 'database' and
into the 'exclusions'.


> (This is not really relevant to charlint, but to the underlying issues;
> IETF may have to take a stand on normalization Real Soon Now....)

Exactly. I think the Unicode Consortium and the W3C have a solution
ready, and would be glad to help the IETF adopt it.


Regards,    Martin.


#-#-#  Martin J. Du"rst, World Wide Web Consortium
#-#-#  mailto:duerst@w3.org   http://www.w3.org
Received on Thursday, 15 July 1999 20:02:08 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 5 June 2006 15:10:51 GMT