W3C home > Mailing lists > Public > www-international@w3.org > April to June 2004

Re: xml:lang question, markup for things like 'kursee', 'arigato'?

From: John Cowan <cowan@ccil.org>
Date: Wed, 16 Jun 2004 11:18:43 -0400
To: Misha Wolf <Misha.Wolf@reuters.com>
Cc: www-international@w3.org
Message-ID: <20040616151843.GG28499@ccil.org>

Misha Wolf scripsit:

> In my case, I was trying to decide what xml:lang values to use for
> brief Turkish phrases which have been degraded to the Latin alphabet
> as used for English.  Both the Turkish writing system and the English
> writing system use the Latin script.  It would surely not be helpful
> to mark both the original phrase and the degraded version as "tr-Latn"?

In the RFC 3066 bis regime (which we are not yet in) you could
use "tr-x-misspelled".  Today, I think "tr" is the best you can do.
There is plenty of English text properly tagged but orthographically

John Cowan  www.ccil.org/~cowan  www.reutershealth.com  cowan@ccil.org
We want more school houses and less jails; more books and less arsenals;
more learning and less vice; more constant work and less crime; more
leisure and less greed; more justice and less revenge; in fact, more of
the opportunities to cultivate our better natures.  --Samuel Gompers
Received on Wednesday, 16 June 2004 11:14:34 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 22:40:49 UTC