Re: LANG= for character-mapping
Albert Lunde wrote:
>At 8:09 PM 7/23/96, Hans van Mourik wrote:
>>Hello to you internationalisationisers,
>>I would like to know how the HTML LANG-attribute should be linked
>>up to a particular character-set. In fact what I'm looking for is an
>>HTML-equivalent for the TEI ``writing system declarations''.
>>Are there any thoughts about such a thing?
>It is my impression that the intention of the various HTML and HTTP drafts
>that have addressed this is that "language" and character encoding (a.k.a.
>MIME charset) are, so to speak, "independent variables". In the general
>case, neither determines the other. There are different HTTP headers for
>charset and language.
Exactly. In addition, LANG can be changed inside the document to
make multilingual documents possible. MIME "charset" is the same
for the whole document, without any problems.
>It's been a while since I read the TEI documents.
>Taking a look at them it, appears that the "writing system declaration"
>(1) the language
>(2) the writing system (script, alphabet, syllabary) used to write the langage
>(3) the coded character set, entity names, or transliteration scheme used
>to represent the graphic characters of the writing system.
>There is stuff defined in HTML and HTTP specs that addresses (1) and (3)
>independently, but not much is said about (2) or the combination of the
>Perhaps someone wiser than me about the TEI can say more.
Not that I am wiser about TEI, but I understand now a little bit about
"writing system declaration". (2) is not a problem at all, it is very
evident from the ISO 10646 characters in which scritp they are.
As for (3), TEI has a much different range of applications than HTML.
For example, transliterations may be very important because of
the possibility of including texts that have already been input
with some transliteration method. In the context of the WWW,
transliteration was also discussed, but mainly in the context
of helping a user that knows the language but not the script.
This is largely a client-side issue. As for character encoding
in general, HTML relies on HTTP and MIME in this respect, which
is very reasonable for an internet application, whereas TEI
had to or wanted to define its own stuff.