W3C home > Mailing lists > Public > www-international@w3.org > January to March 2005

Re: Migrating legacy multilingual data to utf8

From: Addison Phillips <addison.phillips@quest.com>
Date: Wed, 9 Mar 2005 12:00:36 -0800
Message-ID: <634978A7DF025A40BFEF33EB191E13BC0A88085A@irvmbxw01.quest.com>
To: <www-international@w3.org>
Hi Deborah,

I agree with what Tex and Andrea have noted. I should point out that while migrating to delivery of UTF-8 (the actual data conversion) is a (sometimes serious) technical problem, using a UTF-8-encoded database or data files probably is not.

For example, many databases perform character encoding conversions in the client layer. Depending on the technology you use to access the database, your client applications may never realize that the storage mechanism is now using Unicode to encode character data internally. 

This suggests a strategy of back-to-front conversion. Using a single encoding in your back end allows you to do things with your data that currently are very difficult to achieve, such as re-purposing content or merging content on the fly--and it makes it easier to manage content, since none of the content requires special handling unique to that content any more. The encoding the user-agent sees may remain as some form of legacy encoding, even for a very long time: converting to legacy encodings can occur in the very outer layers of the user interface.

The specific details of a Unicode strategy very greatly, depending on your organization's implementation details and other choices. Choosing to migrate data to Unicode provides benefits that make a project of this nature a clear choice to most management folks, provided that it can be accommodated within the schedules and budgets assigned to particular products. The challenge is to remove the fear, uncertainty, and doubt from the equation.

Best Regards,


Addison P. Phillips
Globalization Architect, Quest Software

Chair, Internationalization Core Working Group

Internationalization is not a feature.
It is an architecture. 

Received on Wednesday, 9 March 2005 20:01:17 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 22:40:50 UTC