W3C home > Mailing lists > Public > www-international@w3.org > April to June 2004

Re: xml:lang question, markup for things like 'kursee', 'arigato'?

From: John Cowan <cowan@ccil.org>
Date: Wed, 16 Jun 2004 09:34:58 -0400
To: Dan Brickley <danbri@w3.org>
Cc: www-international@w3.org
Message-ID: <20040616133453.GF28499@ccil.org>

Dan Brickley scripsit:

> An xml:lang question... If I have a string that's the
> transliteration of something in, say, Arabic or Japanese, do I use
> xml:lang="ja" the same way as if it'd been in Japanese characters? Or is

There are three basic answers to that:

1) You can use simply "ja", since there can be no confusion between
Japanese in Latin characters and Japanese in native characters (which
now include Latin ones), since Unicode makes unique distinctions.

2) If you have a specific need, you can apply for the language tag
"ja-latn" to be standardized.  This is a relatively streamlined process
conducted on the mailing list ietf-languages@alvestrand.no.  The list
discusses the question for a few weeks, Michael Everson (in his copious
spare time) blesses or damns it, and in the former case IANA eventually
adds it to the registry.  Currently we have such tags for languages
for which more than one script is frequently used, such as Serbian
and Azerbaycani.

3) The proposed successor to RFC 3066 will, if it passes the IETF process,
allow the creation of tags like "ja-latn" on the fly.

> (BTW what's the correct way to refer to these terms? 'phonetic spellings
> in roman alphabet'? Or, er, latin? I get confused embarrasingly easy by
> this stuff.)

Transcriptions.  The difference between transliteration and transcription
is this:  a transliteration is a reversible equivalence between one script
and another, a translation is the expression of a language written in
one script in a form that seems reasonable to readers of another language
written in another script.  There are various standard and non-standard
transliterations between Cyrillic script and Latin script, for example;
there are English and French and German transcriptions of Russian.

Unicode calls the Roman script "Latin" to avoid confusion with roman
(upright) type as opposed to slanted or italic; one may refer to a roman
Cyrillic font, for example.

-- 
At the end of the Metatarsal Age, the dinosaurs         John Cowan
abruptly vanished. The theory that a single             cowan@ccil.org
catastrophic event may have been responsible            www.reutershealth.com
has been strengthened by the recent discovery of        www.ccil.org/~cowan
a worldwide layer of whipped cream marking the
Creosote-Tutelary boundary.             --Science Made Stupid
Received on Wednesday, 16 June 2004 09:30:50 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:17:03 GMT