Re: xml:lang question, markup for things like 'kursee', 'arigato'?

Quoting Dan Brickley <danbri@w3.org>:

> An xml:lang question... If I have a string that's the
> transliteration of something in, say, Arabic or Japanese, do I use
> xml:lang="ja" the same way as if it'd been in Japanese characters? Or is
> there an idiom to indicate transliteration?
> 
> eg 'kursee' is an anglo-friendly tranliteration of the arabic
> for 'chair'... what xml:lang to wrap around it?

Currently there you would mark them as Japanese or Arabic respectively. It seems
likely (i.e. almost definite) that RFC3066's replacement will encode script
information (in the mean time there are a handful of registered tags with
script information, sr-Cyrl, sr-Latn, uz-Cyrl, uz-Latn, az-Arab, az-Cyrl,
az-Latn).

> (BTW what's the correct way to refer to these terms? 'phonetic spellings
> in roman alphabet'? Or, er, latin? I get confused embarrasingly easy by
> this stuff.)

"The Latin script" seems the most common expression these days, but I've never
seen "Roman Alphabet" get flames. I don't think "Roman" is applied to Latin
variants like Fraktur, Gaelic or Carolingian scripts.

> It might well be that what I'm asking goes beyond the limited reach of
> xml:lang, and a higher level representation is needed to capture
> everything I'm trying to say. But still, I'd like to know what if
> anything I ought to be saying at the xml:lang level...

In the meantime use xml:lang="ja", xml:lang="ar" etc..

-- 
Jon Hanna
<http://www.hackcraft.net/>
"…it has been truly said that hackers have even more words for
equipment failures than Yiddish has for obnoxious people." - jargon.txt

Received on Wednesday, 16 June 2004 09:44:10 UTC