W3C home > Mailing lists > Public > www-international@w3.org > July to September 1996

LANG= for character-mapping

From: Hans van Mourik <MOURIK@rullet.LeidenUniv.nl>
Date: Tue, 23 Jul 1996 20:09:13 +0100 (MET)
To: www-international@w3.org
Message-id: <01I7FAPKRVTU8YDH60@rullet.LeidenUniv.nl>
Hello to you internationalisationisers,

I would like to know how the HTML LANG-attribute should be linked
up to a particular character-set. In fact what I'm looking for is an
HTML-equivalent for the TEI ``writing system declarations''.
Are there any thoughts about such a thing?

  I am aware that, er, as a newcomer to this list, I might actually
  raise a point which has already been discussed. Well -- sorry,
  you may flame me if you want to.

What we (NHDA) would like to do is to serve documents containing
*multiple languages*. We're not so much interested in serving a
directory with multiple translations of the same instance.

Consider a document containing both French, German and Russian.
HTML 3.2 offers us the possibility of marking divisions, paragraphs,
<span>'s and so on with lang="ru" | lang="fr" | lang="de".
But then what?

Now suppose the HTTP Charset-header is set to some Russian character-
encoding (Ms. codepage 1251, KOI-8R or ISO 8859-5 -- you may pick your
choice). What happens to entities like &eacute; an &ouml;? Browsers like
Navigator, Explorer and Mosaic will map them blindly to #233 and #246.
And so they'll appear as arbitrary Russian characters.
  (How about Arena/Amaya -- I haven't checked that one).

Do we have to publish it in Unicode instead then? -- ie. let most
browsers just break and wait for the *perfect browser* to come along.
I don't think so.

I would say the LANG attribute is very appropriate (amongst others)
to indicate a specific character mapping. (ie. "8-bits to Unicode")
I may be wrong, but I haven't seen very much about this attribute
lately. I thought it actually appeared in earlier versions of the
CSS-draft. It doesn't any more.

So, How about some IDREF-linking to make things work?

;;; Hans van Mourik
;;; mourik@rullet.leidenuniv.nl
;;;                               Netherlands Historical Data Archive
;;;                               PO Box 9515
;;;                               2300 RA Leiden, The Netherlands
;;;                               (+31)-70-5272719
Received on Tuesday, 23 July 1996 16:00:17 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:16:45 GMT