W3C home > Mailing lists > Public > public-esw-thes@w3.org > February 2007

Re: [SKOS] languages and scripts

From: Ivan Herman <ivan@w3.org>
Date: Tue, 06 Feb 2007 17:25:47 +0100
Message-ID: <45C8AC0B.1090805@w3.org>
To: "Miles, AJ \(Alistair\)" <A.J.Miles@rl.ac.uk>
CC: public-swd-wg@w3.org, public-esw-thes@w3.org, nagamori@slis.tsukuba.ac.jp


might be a good place to find information.


gives a bunch of other references, some of them may be relevant...


Miles, AJ (Alistair) wrote:
> Hi all,
> Just jotting down some notes prior to raising an issue ...
> At DC2006 I spoke to Mitsuharu Nagamori from the University of Tsukuba (cced) about a SKOS encoding of the Japanese National Library classification scheme. Mitsuharu and I discussed design options for representing the features of the classification scheme. Mitsuharu also taught me about the various scripts that are used for the Japanese written language. I am still very ignorant about the Japanese language so please forgive me if I make any errors in this email.
> As I understand it, there are several different scripts available for writing Japanese [1]. These are the Kanji script (characters of Chinese origin), the Hiragana script (a syllabary), the Katakana script (also a syllabary) and the Latin alphabet.
> In the JNL classification scheme, all four scripts may be used. 
> The general situation in which a concept may be labelled using multiple scripts within the same language gives rise to a number of potential issues.
> Firstly, an application may wish to distinguish between labels in different scripts, for display purposes. How is the script of a label to be represented in an RDF graph?
> I found a standard list of script names [2], I believe for Japanese the values are as follows ..
>  * Hani (Kanji)
>  * Hira (Hiragana)
>  * Kana (Katakana)
>  * Latn (Latin)
> I then had a look at RFC 3066 [3] to see if the script names could be used within language tags. To paraphrase, [3] says that a language tag can be built up from any number of subtags separated by "-" character. If I've understood it correctly, the first subtag is supposed to be the language code (from ISO 639-1 or ISO 639-2 e.g. "en"), the second subtag is supposed to be a country code (from ISO 3166), and the third subtag can be anything you want. So e.g. you can have "sgn-US-MA" for Martha's Vineyard Sign Language, which is found in the state of Massachusetts, US.
> So can you have e.g. "ja-JP-Kana" for japanese - Japan - Katakana script?
> Then I found this email from Jeremy Carroll [4] that suggests you can put the script name and the country code the other way around, e.g. "zh-hant-TW". Does anyone know what the rules are for including script names in language tags, and where this is specified?
> If it is possible to embed script names in language tags, then the representational issue can be resolved.
> Secondly, this bears on the cardinality of the skos:prefLabel property. The SKOS Core Guide [5] currently says, "A concept should have no more than one preferred lexical label per language." However, when working with multiple scripts a concept scheme would need one preferred lexical label per language per script.
> Thirdly, this is another scenario which gives rise to the need for expressing relationships between lexical labels. E.g. if a concept has both preferred and alternative labels in multiple scripts, an application might want to display equivalent labels from different scripts beside each other to aid with reading. 
> That's all I have for now.
> Cheers,
> Alistair.
> [1] http://en.wikipedia.org/wiki/Japanese_writing_system
> [2] http://www.unicode.org/iso15924/iso15924-codes.html
> [3] http://www.ietf.org/rfc/rfc3066.txt
> [4] http://www.alvestrand.no/pipermail/ietf-languages/2004-March/001809.html
> [5] http://www.w3.org/TR/2005/WD-swbp-skos-core-guide-20051102/#secmulti
> --
> Alistair Miles
> Research Associate
> CCLRC - Rutherford Appleton Laboratory
> Building R1 Room 1.60
> Fermi Avenue
> Chilton
> Didcot
> Oxfordshire OX11 0QX
> United Kingdom
> Web: http://purl.org/net/aliman
> Email: a.j.miles@rl.ac.uk
> Tel: +44 (0)1235 445440


Ivan Herman, W3C Semantic Web Activity Lead
URL: http://www.w3.org/People/Ivan/
PGP Key: http://www.cwi.nl/%7Eivan/AboutMe/pgpkey.html
FOAF: http://www.ivan-herman.net/foaf.rdf

Received on Tuesday, 6 February 2007 16:25:48 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 19:45:38 UTC