- From: Ivan Herman <ivan@w3.org>
- Date: Tue, 06 Feb 2007 17:25:47 +0100
- To: "Miles, AJ \(Alistair\)" <A.J.Miles@rl.ac.uk>
- CC: public-swd-wg@w3.org, public-esw-thes@w3.org, nagamori@slis.tsukuba.ac.jp
- Message-ID: <45C8AC0B.1090805@w3.org>
Alistair, http://www.w3.org/International/articles/language-tags/Overview.en.php might be a good place to find information. http://www.w3.org/International/resource-index.php?topic=lang gives a bunch of other references, some of them may be relevant... Ivan Miles, AJ (Alistair) wrote: > Hi all, > > Just jotting down some notes prior to raising an issue ... > > At DC2006 I spoke to Mitsuharu Nagamori from the University of Tsukuba (cced) about a SKOS encoding of the Japanese National Library classification scheme. Mitsuharu and I discussed design options for representing the features of the classification scheme. Mitsuharu also taught me about the various scripts that are used for the Japanese written language. I am still very ignorant about the Japanese language so please forgive me if I make any errors in this email. > > As I understand it, there are several different scripts available for writing Japanese [1]. These are the Kanji script (characters of Chinese origin), the Hiragana script (a syllabary), the Katakana script (also a syllabary) and the Latin alphabet. > > In the JNL classification scheme, all four scripts may be used. > > The general situation in which a concept may be labelled using multiple scripts within the same language gives rise to a number of potential issues. > > Firstly, an application may wish to distinguish between labels in different scripts, for display purposes. How is the script of a label to be represented in an RDF graph? > > I found a standard list of script names [2], I believe for Japanese the values are as follows .. > > * Hani (Kanji) > * Hira (Hiragana) > * Kana (Katakana) > * Latn (Latin) > > I then had a look at RFC 3066 [3] to see if the script names could be used within language tags. To paraphrase, [3] says that a language tag can be built up from any number of subtags separated by "-" character. If I've understood it correctly, the first subtag is supposed to be the language code (from ISO 639-1 or ISO 639-2 e.g. "en"), the second subtag is supposed to be a country code (from ISO 3166), and the third subtag can be anything you want. So e.g. you can have "sgn-US-MA" for Martha's Vineyard Sign Language, which is found in the state of Massachusetts, US. > > So can you have e.g. "ja-JP-Kana" for japanese - Japan - Katakana script? > > Then I found this email from Jeremy Carroll [4] that suggests you can put the script name and the country code the other way around, e.g. "zh-hant-TW". Does anyone know what the rules are for including script names in language tags, and where this is specified? > > If it is possible to embed script names in language tags, then the representational issue can be resolved. > > Secondly, this bears on the cardinality of the skos:prefLabel property. The SKOS Core Guide [5] currently says, "A concept should have no more than one preferred lexical label per language." However, when working with multiple scripts a concept scheme would need one preferred lexical label per language per script. > > Thirdly, this is another scenario which gives rise to the need for expressing relationships between lexical labels. E.g. if a concept has both preferred and alternative labels in multiple scripts, an application might want to display equivalent labels from different scripts beside each other to aid with reading. > > That's all I have for now. > > Cheers, > > Alistair. > > [1] http://en.wikipedia.org/wiki/Japanese_writing_system > [2] http://www.unicode.org/iso15924/iso15924-codes.html > [3] http://www.ietf.org/rfc/rfc3066.txt > [4] http://www.alvestrand.no/pipermail/ietf-languages/2004-March/001809.html > [5] http://www.w3.org/TR/2005/WD-swbp-skos-core-guide-20051102/#secmulti > -- > Alistair Miles > Research Associate > CCLRC - Rutherford Appleton Laboratory > Building R1 Room 1.60 > Fermi Avenue > Chilton > Didcot > Oxfordshire OX11 0QX > United Kingdom > Web: http://purl.org/net/aliman > Email: a.j.miles@rl.ac.uk > Tel: +44 (0)1235 445440 > > > -- Ivan Herman, W3C Semantic Web Activity Lead URL: http://www.w3.org/People/Ivan/ PGP Key: http://www.cwi.nl/%7Eivan/AboutMe/pgpkey.html FOAF: http://www.ivan-herman.net/foaf.rdf
Received on Tuesday, 6 February 2007 16:25:49 UTC