W3C home > Mailing lists > Public > www-international@w3.org > April to June 2004

RE: xml:lang question, markup for things like 'kursee', 'arigato'?

From: Misha Wolf <Misha.Wolf@reuters.com>
Date: Wed, 16 Jun 2004 15:12:10 +0100
Message-ID: <T6a3a6df79cc407b73df68@dtcseuvig6.reuters.com>
To: www-international@w3.org
Cc: Dan Brickley <danbri@w3.org>

I'm not at all sure about John's and Jon's answers.  As it 
happens, I was pondering the very same question just 20 mins 
before Dan's mail arrived.  In my case, I was trying to decide 
what xml:lang values to use for brief Turkish phrases which 
have been degraded to the Latin alphabet as used for English.
Both the Turkish writing system and the English writing system 
use the Latin script.  It would surely not be helpful to mark 
both the original phrase and the degraded version as "tr-Latn"?

Misha


-----Original Message-----
From: www-international-request@w3.org
[mailto:www-international-request@w3.org] On Behalf Of Jon Hanna
Sent: 16 June 2004 14:44
To: Dan Brickley
Cc: www-international@w3.org
Subject: Re: xml:lang question, markup for things like 'kursee',
'arigato'?



Quoting Dan Brickley <danbri@w3.org>:

> An xml:lang question... If I have a string that's the
> transliteration of something in, say, Arabic or Japanese, do I use
> xml:lang="ja" the same way as if it'd been in Japanese characters? Or
is
> there an idiom to indicate transliteration?
> 
> eg 'kursee' is an anglo-friendly tranliteration of the arabic
> for 'chair'... what xml:lang to wrap around it?

Currently there you would mark them as Japanese or Arabic respectively.
It seems
likely (i.e. almost definite) that RFC3066's replacement will encode
script
information (in the mean time there are a handful of registered tags
with
script information, sr-Cyrl, sr-Latn, uz-Cyrl, uz-Latn, az-Arab,
az-Cyrl,
az-Latn).

> (BTW what's the correct way to refer to these terms? 'phonetic
spellings
> in roman alphabet'? Or, er, latin? I get confused embarrasingly easy
by
> this stuff.)

"The Latin script" seems the most common expression these days, but I've
never
seen "Roman Alphabet" get flames. I don't think "Roman" is applied to
Latin
variants like Fraktur, Gaelic or Carolingian scripts.

> It might well be that what I'm asking goes beyond the limited reach of
> xml:lang, and a higher level representation is needed to capture
> everything I'm trying to say. But still, I'd like to know what if
> anything I ought to be saying at the xml:lang level...

In the meantime use xml:lang="ja", xml:lang="ar" etc..

-- 
Jon Hanna
<http://www.hackcraft.net/>
"...it has been truly said that hackers have even more words for
equipment failures than Yiddish has for obnoxious people." - jargon.txt



-----------------------------------------------------------------
        Visit our Internet site at http://www.reuters.com

Get closer to the financial markets with Reuters Messaging - for more
information and to register, visit http://www.reuters.com/messaging

Any views expressed in this message are those of  the  individual
sender,  except  where  the sender specifically states them to be
the views of Reuters Ltd.
Received on Wednesday, 16 June 2004 10:12:48 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:17:03 GMT