- From: Tex Texin <tex@xencraft.com>
- Date: Sat, 27 Aug 2005 05:24:10 -0700
- To: Richard Ishida <ishida@w3.org>
- CC: www-international@w3.org
Richard, Hi. It's nice to see the steady stream of faqs and updates. Good going! A couple comments: 1) This is rather subjective, but the link "it's useful" would be better replaced by a crisp sentence or two on the benefit of moving to utf-8. I also think linking into the middle of the tutorial is disorienting and unexpected. It's not the end of the world, but I think a good UI makes you trust links, and comforts you by giving you what you expect. Perhaps some type of indication of the nature of the link's target is called for. (fax, tutorial, article...) 2) The faq should point out a few of the risks and either how to reduce the risk or where to go to learn more about it. In particular, a faq on changing encodings should say what to look out for and how to check that it actually succeeded. Most of the following are not high probability, but we should warn naive users to consider the possibility. Risk a: When changing the encoding to utf-8, it is critical that the encoding of the original data be known accurately and precisely. Much of the world's data is mislabeled. Iso 8859-1 instead of windows1252, big-5 instead of big5-hkscs, cp936 for gb2312, cp949 for ksc 5601, and so forth. (And not just microsoft encodings) And many editors will merrily convert the data to utf-8 as if it were iso 8859-1 and not the encoding it actually is. Risk b: The conversion tables or programs should be up to date. Some convertors are now seriously out of date. Unicode has more choices for characters now... Risk c: Some old software might use incorrect encodings for utf-8, especially with respect to surrogates. Risk d: For some legacy encodings, it might be worth pointing out that a convertor should generate NFC. 3) A different kind of risk, is understanding the type of data being represented, and whether changing the encoding changes the semantics. Risk e: URLs If the document changes the encoding, any URLs in the document that contain a query portion, might now have a broken link, if the query isn't first put into an ascii-compatible form. Risk f: FORMS, Applications If the document contains a form, by changing the encoding, the form will send data to the server in utf-8 rather than the original encoding. The server application may need to have a corresponding change to take this into account. Risk g: CSS If the CSS document does not contain an encoding declaration, it can inherit the encoding of the referring document. Changing the encoding of the (X)HTML document may require CSS documents it references to also change encoding. For CSS sheets shared by several documents, this can be a problem unless all are changed at the same time. Risk h: Embedded scripts Any php, javascript, etc. within the document that now needs to have its code altered? Given time, I could probably come up with a few more. If others contribute to the list, it might make a nice separate faq or document on unicode conversion considerations. 4) QA The other piece then, is how do you know that the conversion was successful, other than the process seemed to complete. Is the result valid utf-8? Did the characters convert appropriately? (e.g. Did yen sign convert to backslash or currency sign based on context?) Does the document still have the same meaning? (Do users think a character has changed?) Does it still integrate with other applications (eg cgi, etc.) appropriately? Most readers on www-international, would intutively recognize situations where any of the above risks might be probable and would either not do the conversion or first address the risk, so the point might seem trite. But many of the people searching out the faq might not anticipate that problems can occur, so they should be made clear without scaring them off. hth tex Richard Ishida wrote: > > After incorporating comments from the review phase, the GEO Working Group has published the FAQ-based article: > > Changing (X)HTML page encoding to UTF-8 > http://www.w3.org/International/questions/qa-changing-encoding > By Richard Ishida, W3C > > Aimed at newcomers to internationalization who want to change the encoding of their (X)HTML pages, this article provides an answer to the question: How do I change the encoding of my (X)HTML pages to UTF-8? > -- ------------------------------------------------------------- Tex Texin cell: +1 781 789 1898 mailto:Tex@XenCraft.com Xen Master http://www.i18nGuy.com XenCraft http://www.XenCraft.com Making e-Business Work Around the World -------------------------------------------------------------
Received on Saturday, 27 August 2005 12:24:21 UTC