W3C home > Mailing lists > Public > public-swd-wg@w3.org > February 2007

Re: [SKOS] languages and scripts

From: Ivan Herman <ivan@w3.org>
Date: Tue, 06 Feb 2007 17:25:47 +0100
Message-ID: <45C8AC0B.1090805@w3.org>
To: "Miles, AJ \(Alistair\)" <A.J.Miles@rl.ac.uk>
CC: public-swd-wg@w3.org, public-esw-thes@w3.org, nagamori@slis.tsukuba.ac.jp
Alistair,

http://www.w3.org/International/articles/language-tags/Overview.en.php

might be a good place to find information.

http://www.w3.org/International/resource-index.php?topic=lang

gives a bunch of other references, some of them may be relevant...

Ivan

Miles, AJ (Alistair) wrote:
> Hi all,
> 
> Just jotting down some notes prior to raising an issue ...
> 
> At DC2006 I spoke to Mitsuharu Nagamori from the University of Tsukuba (cced) about a SKOS encoding of the Japanese National Library classification scheme. Mitsuharu and I discussed design options for representing the features of the classification scheme. Mitsuharu also taught me about the various scripts that are used for the Japanese written language. I am still very ignorant about the Japanese language so please forgive me if I make any errors in this email.
> 
> As I understand it, there are several different scripts available for writing Japanese [1]. These are the Kanji script (characters of Chinese origin), the Hiragana script (a syllabary), the Katakana script (also a syllabary) and the Latin alphabet.
> 
> In the JNL classification scheme, all four scripts may be used. 
> 
> The general situation in which a concept may be labelled using multiple scripts within the same language gives rise to a number of potential issues.
> 
> Firstly, an application may wish to distinguish between labels in different scripts, for display purposes. How is the script of a label to be represented in an RDF graph?
> 
> I found a standard list of script names [2], I believe for Japanese the values are as follows ..
> 
>  * Hani (Kanji)
>  * Hira (Hiragana)
>  * Kana (Katakana)
>  * Latn (Latin)
> 
> I then had a look at RFC 3066 [3] to see if the script names could be used within language tags. To paraphrase, [3] says that a language tag can be built up from any number of subtags separated by "-" character. If I've understood it correctly, the first subtag is supposed to be the language code (from ISO 639-1 or ISO 639-2 e.g. "en"), the second subtag is supposed to be a country code (from ISO 3166), and the third subtag can be anything you want. So e.g. you can have "sgn-US-MA" for Martha's Vineyard Sign Language, which is found in the state of Massachusetts, US.
> 
> So can you have e.g. "ja-JP-Kana" for japanese - Japan - Katakana script?
> 
> Then I found this email from Jeremy Carroll [4] that suggests you can put the script name and the country code the other way around, e.g. "zh-hant-TW". Does anyone know what the rules are for including script names in language tags, and where this is specified?
> 
> If it is possible to embed script names in language tags, then the representational issue can be resolved.
> 
> Secondly, this bears on the cardinality of the skos:prefLabel property. The SKOS Core Guide [5] currently says, "A concept should have no more than one preferred lexical label per language." However, when working with multiple scripts a concept scheme would need one preferred lexical label per language per script.
> 
> Thirdly, this is another scenario which gives rise to the need for expressing relationships between lexical labels. E.g. if a concept has both preferred and alternative labels in multiple scripts, an application might want to display equivalent labels from different scripts beside each other to aid with reading. 
> 
> That's all I have for now.
> 
> Cheers,
> 
> Alistair.
> 
> [1] http://en.wikipedia.org/wiki/Japanese_writing_system
> [2] http://www.unicode.org/iso15924/iso15924-codes.html
> [3] http://www.ietf.org/rfc/rfc3066.txt
> [4] http://www.alvestrand.no/pipermail/ietf-languages/2004-March/001809.html
> [5] http://www.w3.org/TR/2005/WD-swbp-skos-core-guide-20051102/#secmulti
> --
> Alistair Miles
> Research Associate
> CCLRC - Rutherford Appleton Laboratory
> Building R1 Room 1.60
> Fermi Avenue
> Chilton
> Didcot
> Oxfordshire OX11 0QX
> United Kingdom
> Web: http://purl.org/net/aliman
> Email: a.j.miles@rl.ac.uk
> Tel: +44 (0)1235 445440
> 
> 
> 

-- 

Ivan Herman, W3C Semantic Web Activity Lead
URL: http://www.w3.org/People/Ivan/
PGP Key: http://www.cwi.nl/%7Eivan/AboutMe/pgpkey.html
FOAF: http://www.ivan-herman.net/foaf.rdf

Received on Tuesday, 6 February 2007 16:25:49 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 8 January 2008 14:17:28 GMT