Re: Migrating legacy multilingual data to utf8

Hi Deborah,

I agree with what Tex and Andrea have noted. I should point out that while migrating to delivery of UTF-8 (the actual data conversion) is a (sometimes serious) technical problem, using a UTF-8-encoded database or data files probably is not.

For example, many databases perform character encoding conversions in the client layer. Depending on the technology you use to access the database, your client applications may never realize that the storage mechanism is now using Unicode to encode character data internally. 

This suggests a strategy of back-to-front conversion. Using a single encoding in your back end allows you to do things with your data that currently are very difficult to achieve, such as re-purposing content or merging content on the fly--and it makes it easier to manage content, since none of the content requires special handling unique to that content any more. The encoding the user-agent sees may remain as some form of legacy encoding, even for a very long time: converting to legacy encodings can occur in the very outer layers of the user interface.

The specific details of a Unicode strategy very greatly, depending on your organization's implementation details and other choices. Choosing to migrate data to Unicode provides benefits that make a project of this nature a clear choice to most management folks, provided that it can be accommodated within the schedules and budgets assigned to particular products. The challenge is to remove the fear, uncertainty, and doubt from the equation.

Best Regards,

Addison

Addison P. Phillips
Globalization Architect, Quest Software
http://www.quest.com


Chair, Internationalization Core Working Group
http://www.w3.org/International


Internationalization is not a feature.
It is an architecture. 

Received on Wednesday, 9 March 2005 20:01:17 UTC