W3C home > Mailing lists > Public > www-international@w3.org > January to March 2005

Re: Migrating legacy multilingual data to utf-8

From: William Tan <wil@dready.org>
Date: Thu, 24 Feb 2005 00:44:05 +1100
Message-ID: <421C88A5.8090402@dready.org>
To: Deborah Cawkwell <deborah.cawkwell@bbc.co.uk>
CC: www-international@w3.org

Migrating the data is usually not a problem. Assuming you have a 
standard SQL text column type (char, varchar, etc.), all you need to do 
is writing a script to convert the data to UTF-8 and updating it in 
place or copy over to a temporary table.

The headaches usually come in the codes handling the data, they all have 
to be updated to recognize that the data is in UTF-8, and not whatever 
encoding it assumed. How painful it is really depends on your setup.


Deborah Cawkwell wrote:

>We have legacy multilingual data stored in a Postgres database.
>In our database text is typically stored in the charcater encoding
>in which it was entered. This typically corresponds with the language of
>the text, e.g. Czech: windows-1250; Chinese: gb2312
>We wish to take more advantege of Unicode, and hence we are considering
>migrating the data to UTF-8.
>What is the best way to do this?
>Any advice, experience would be welcome.
>This e-mail (and any attachments) is confidential and may contain
>personal views which are not the views of the BBC unless specifically
>If you have received it in error, please delete it from your system. 
>Do not use, copy or disclose the information in any way nor act in
>reliance on it and notify the sender immediately. Please note that the
>BBC monitors e-mails sent or received. 
>Further communication will signify your consent to this.
Received on Wednesday, 23 February 2005 13:45:06 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 22:40:50 UTC