RE: Normalizing transcoders from Tex Texin on 2007-11-30 (www-international@w3.org from October to December 2007)

From: Tex Texin <tex@yahoo-inc.com>
Date: Thu, 29 Nov 2007 23:12:10 -0800
To: "Martin Duerst" <duerst@it.aoyama.ac.jp>, <www-international@w3.org>
Message-ID: <012AB2B223CB3F4BB846962876F47217712DC7@SNV-EXVS08.ds.corp.yahoo.com>

Martin, that is interesting news.

When converting utf-8 to 1258 does it also change characters that are not available as composed characters in 1258 to their combining forms?
Most transcoders would error or convert to ?.
 
Tex

-----Original Message-----
From: www-international-request@w3.org [mailto:www-international-request@w3.org] On Behalf Of Martin Duerst
Sent: Thursday, November 29, 2007 11:01 PM
To: www-international@w3.org
Subject: Normalizing transcoders


[This is mostly a topic for a/the WG, related to normalization and the Normalization part of the Character Model, but I'm sending it here because expecting wider input.]

The Character Model: Normalization introduces the concept of a Normalizing Transcoder (http://www.w3.org/TR/charmod-norm/#sec-NormalizingTranscoder).

Up to yesterday, I was under the impression that such transcoders are mostly of theoretical existence. But yesterday, I discovered that the gnu iconv implementation on my cygwin system implemented a normalizing transcoder for windows-1258 -> UTF-8.
Windows-1258 is probably the most widely used legacy encoding for Vietnamese, and Vietnamese is in practice the language most in need for a clear normalization policy.

I would like to take this as an opportunity to collect information on other normalizing transcoders. If you know of some, please reply to this mailing list.

Regards,     Martin.


#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst@it.aoyama.ac.jp

Received on Friday, 30 November 2007 07:12:26 UTC