LANG= for character-mapping
Hello to you internationalisationisers,
I would like to know how the HTML LANG-attribute should be linked
up to a particular character-set. In fact what I'm looking for is an
HTML-equivalent for the TEI ``writing system declarations''.
Are there any thoughts about such a thing?
I am aware that, er, as a newcomer to this list, I might actually
raise a point which has already been discussed. Well -- sorry,
you may flame me if you want to.
What we (NHDA) would like to do is to serve documents containing
*multiple languages*. We're not so much interested in serving a
directory with multiple translations of the same instance.
Consider a document containing both French, German and Russian.
HTML 3.2 offers us the possibility of marking divisions, paragraphs,
<span>'s and so on with lang="ru" | lang="fr" | lang="de".
But then what?
Now suppose the HTTP Charset-header is set to some Russian character-
encoding (Ms. codepage 1251, KOI-8R or ISO 8859-5 -- you may pick your
choice). What happens to entities like é an ö? Browsers like
Navigator, Explorer and Mosaic will map them blindly to #233 and #246.
And so they'll appear as arbitrary Russian characters.
(How about Arena/Amaya -- I haven't checked that one).
Do we have to publish it in Unicode instead then? -- ie. let most
browsers just break and wait for the *perfect browser* to come along.
I don't think so.
I would say the LANG attribute is very appropriate (amongst others)
to indicate a specific character mapping. (ie. "8-bits to Unicode")
I may be wrong, but I haven't seen very much about this attribute
lately. I thought it actually appeared in earlier versions of the
CSS-draft. It doesn't any more.
So, How about some IDREF-linking to make things work?
;;; Hans van Mourik
;;; Netherlands Historical Data Archive
;;; PO Box 9515
;;; 2300 RA Leiden, The Netherlands