- From: Yves Arrouye <yves@realnames.com>
- Date: Tue, 25 Sep 2001 14:10:59 -0700
- To: www-international@w3.org
> If you look at Michael Kaplan's website www.trigeminal.com > you will see an example of a page that contains a > sentence in many different languages and that page is > encoded in UTF-8. But this page is HTML, not XML. (I am sure this is covered by the resources sent by Thierry. It will just be a small example of good practice, besides the obvious one of making sure that the encoding of the document can support all the required characters.) If you want to use many languages in the same XML document, it is most useful, just as in HTML, to mark the language of content using xml:lang. One could for example have an XML snippet looking like (encoded in ISO 8859-1 here, but the same applies regardless of the encoding): <item> <catid>C-2353J</catid> <description xml:lang="en-US">This great summer tire [...]</description> <description xml:lang="en-GB">This great summer tyre [...]</description> <description xml:lang="fr-FR">Cet excellent pneu d'été [...]</description> </item> This tagging will not only enable processors to extract the relevant information for different users / purposes, but in the kind of CJK languages, it is very important to know the language in order to select the appropriate glyphs for unified Han characters. This is I believe the main reason <span> was introduced in HTML so that one could use a lang attribute on <span> for this and other purposes. (Ready to stand corrected :)) YA
Received on Tuesday, 25 September 2001 17:14:38 UTC