- From: Sue Ellen Wright <sellenwright@gmail.com>
- Date: Tue, 6 Feb 2007 14:42:05 -0500
- To: "Ivan Herman" <ivan@w3.org>
- Cc: "Miles, AJ (Alistair)" <A.J.Miles@rl.ac.uk>, public-swd-wg@w3.org, public-esw-thes@w3.org, nagamori@slis.tsukuba.ac.jp
- Message-ID: <e35499310702061142q71a0ff84uc054479b0ba40571@mail.gmail.com>
Hi, All, Do look at IETF 4646, which is the new replacement for 3066. It spells out precisely how to report script codes as part of the language tag. I strongly recommend the incorporation of 4646 into all apps involving multilingual tagging. Bye for now Sue Ellen Wright On 2/6/07, Ivan Herman <ivan@w3.org> wrote: > > Alistair, > > http://www.w3.org/International/articles/language-tags/Overview.en.php > > might be a good place to find information. > > http://www.w3.org/International/resource-index.php?topic=lang > > gives a bunch of other references, some of them may be relevant... > > Ivan > > Miles, AJ (Alistair) wrote: > > Hi all, > > > > Just jotting down some notes prior to raising an issue ... > > > > At DC2006 I spoke to Mitsuharu Nagamori from the University of Tsukuba > (cced) about a SKOS encoding of the Japanese National Library classification > scheme. Mitsuharu and I discussed design options for representing the > features of the classification scheme. Mitsuharu also taught me about the > various scripts that are used for the Japanese written language. I am still > very ignorant about the Japanese language so please forgive me if I make any > errors in this email. > > > > As I understand it, there are several different scripts available for > writing Japanese [1]. These are the Kanji script (characters of Chinese > origin), the Hiragana script (a syllabary), the Katakana script (also a > syllabary) and the Latin alphabet. > > > > In the JNL classification scheme, all four scripts may be used. > > > > The general situation in which a concept may be labelled using multiple > scripts within the same language gives rise to a number of potential issues. > > > > Firstly, an application may wish to distinguish between labels in > different scripts, for display purposes. How is the script of a label to be > represented in an RDF graph? > > > > I found a standard list of script names [2], I believe for Japanese the > values are as follows .. > > > > * Hani (Kanji) > > * Hira (Hiragana) > > * Kana (Katakana) > > * Latn (Latin) > > > > I then had a look at RFC 3066 [3] to see if the script names could be > used within language tags. To paraphrase, [3] says that a language tag can > be built up from any number of subtags separated by "-" character. If I've > understood it correctly, the first subtag is supposed to be the language > code (from ISO 639-1 or ISO 639-2 e.g. "en"), the second subtag is > supposed to be a country code (from ISO 3166), and the third subtag can be > anything you want. So e.g. you can have "sgn-US-MA" for Martha's Vineyard > Sign Language, which is found in the state of Massachusetts, US. > > > > So can you have e.g. "ja-JP-Kana" for japanese - Japan - Katakana > script? > > > > Then I found this email from Jeremy Carroll [4] that suggests you can > put the script name and the country code the other way around, e.g. > "zh-hant-TW". Does anyone know what the rules are for including script names > in language tags, and where this is specified? > > > > If it is possible to embed script names in language tags, then the > representational issue can be resolved. > > > > Secondly, this bears on the cardinality of the skos:prefLabel property. > The SKOS Core Guide [5] currently says, "A concept should have no more than > one preferred lexical label per language." However, when working with > multiple scripts a concept scheme would need one preferred lexical label per > language per script. > > > > Thirdly, this is another scenario which gives rise to the need for > expressing relationships between lexical labels. E.g. if a concept has > both preferred and alternative labels in multiple scripts, an application > might want to display equivalent labels from different scripts beside each > other to aid with reading. > > > > That's all I have for now. > > > > Cheers, > > > > Alistair. > > > > [1] http://en.wikipedia.org/wiki/Japanese_writing_system > > [2] http://www.unicode.org/iso15924/iso15924-codes.html > > [3] http://www.ietf.org/rfc/rfc3066.txt > > [4] > http://www.alvestrand.no/pipermail/ietf-languages/2004-March/001809.html > > [5] http://www.w3.org/TR/2005/WD-swbp-skos-core-guide-20051102/#secmulti > > -- > > Alistair Miles > > Research Associate > > CCLRC - Rutherford Appleton Laboratory > > Building R1 Room 1.60 > > Fermi Avenue > > Chilton > > Didcot > > Oxfordshire OX11 0QX > > United Kingdom > > Web: http://purl.org/net/aliman > > Email: a.j.miles@rl.ac.uk > > Tel: +44 (0)1235 445440 > > > > > > > > -- > > Ivan Herman, W3C Semantic Web Activity Lead > URL: http://www.w3.org/People/Ivan/ > PGP Key: http://www.cwi.nl/%7Eivan/AboutMe/pgpkey.html > FOAF: http://www.ivan-herman.net/foaf.rdf > > -- Sue Ellen Wright Institute for Applied Linguistics Kent State University Kent OH 44242 USA sellenwright@gmail.com swright@kent.edu sewright@neo.rr.com
Received on Tuesday, 6 February 2007 19:42:13 UTC