RE: Multi-Language Support

> If you look at Michael Kaplan's website www.trigeminal.com
> you will see an example of a page that contains a
> sentence in many different languages and that page is
> encoded in UTF-8.

But this page is HTML, not XML.

(I am sure this is covered by the resources sent by Thierry. It will just be
a small example of good practice, besides the obvious one of making sure
that the encoding of the document can support all the required characters.)

If you want to use many languages in the same XML document, it is most
useful, just as in HTML, to mark the language of content using xml:lang. One
could for example have an XML snippet looking like (encoded in ISO 8859-1
here, but the same applies regardless of the encoding):

	<item>
	  <catid>C-2353J</catid>
	  <description xml:lang="en-US">This great summer tire
[...]</description>
	  <description xml:lang="en-GB">This great summer tyre
[...]</description>
        <description xml:lang="fr-FR">Cet excellent pneu d'été
[...]</description>
	</item>

This tagging will not only enable processors to extract the relevant
information for different users / purposes, but in the kind of CJK
languages, it is very important to know the language in order to select the
appropriate glyphs for unified Han characters. This is I believe the main
reason <span> was introduced in HTML so that one could use a lang attribute
on <span> for this and other purposes. (Ready to stand corrected :))

YA

Received on Tuesday, 25 September 2001 17:14:38 UTC