Re: [SKOS] languages and scripts

Hi, All,
Do look at IETF 4646, which is the new replacement for 3066. It spells out
precisely how to report script codes as part of the language tag. I strongly
recommend the incorporation of 4646 into all apps involving multilingual
tagging.
Bye for now
Sue Ellen Wright


On 2/6/07, Ivan Herman <ivan@w3.org> wrote:
>
> Alistair,
>
> http://www.w3.org/International/articles/language-tags/Overview.en.php
>
> might be a good place to find information.
>
> http://www.w3.org/International/resource-index.php?topic=lang
>
> gives a bunch of other references, some of them may be relevant...
>
> Ivan
>
> Miles, AJ (Alistair) wrote:
> > Hi all,
> >
> > Just jotting down some notes prior to raising an issue ...
> >
> > At DC2006 I spoke to Mitsuharu Nagamori from the University of Tsukuba
> (cced) about a SKOS encoding of the Japanese National Library classification
> scheme. Mitsuharu and I discussed design options for representing the
> features of the classification scheme. Mitsuharu also taught me about the
> various scripts that are used for the Japanese written language. I am still
> very ignorant about the Japanese language so please forgive me if I make any
> errors in this email.
> >
> > As I understand it, there are several different scripts available for
> writing Japanese [1]. These are the Kanji script (characters of Chinese
> origin), the Hiragana script (a syllabary), the Katakana script (also a
> syllabary) and the Latin alphabet.
> >
> > In the JNL classification scheme, all four scripts may be used.
> >
> > The general situation in which a concept may be labelled using multiple
> scripts within the same language gives rise to a number of potential issues.
> >
> > Firstly, an application may wish to distinguish between labels in
> different scripts, for display purposes. How is the script of a label to be
> represented in an RDF graph?
> >
> > I found a standard list of script names [2], I believe for Japanese the
> values are as follows ..
> >
> >  * Hani (Kanji)
> >  * Hira (Hiragana)
> >  * Kana (Katakana)
> >  * Latn (Latin)
> >
> > I then had a look at RFC 3066 [3] to see if the script names could be
> used within language tags. To paraphrase, [3] says that a language tag can
> be built up from any number of subtags separated by "-" character. If I've
> understood it correctly, the first subtag is supposed to be the language
> code (from ISO 639-1 or ISO 639-2 e.g. "en"), the second subtag is
> supposed to be a country code (from ISO 3166), and the third subtag can be
> anything you want. So e.g. you can have "sgn-US-MA" for Martha's Vineyard
> Sign Language, which is found in the state of Massachusetts, US.
> >
> > So can you have e.g. "ja-JP-Kana" for japanese - Japan - Katakana
> script?
> >
> > Then I found this email from Jeremy Carroll [4] that suggests you can
> put the script name and the country code the other way around, e.g.
> "zh-hant-TW". Does anyone know what the rules are for including script names
> in language tags, and where this is specified?
> >
> > If it is possible to embed script names in language tags, then the
> representational issue can be resolved.
> >
> > Secondly, this bears on the cardinality of the skos:prefLabel property.
> The SKOS Core Guide [5] currently says, "A concept should have no more than
> one preferred lexical label per language." However, when working with
> multiple scripts a concept scheme would need one preferred lexical label per
> language per script.
> >
> > Thirdly, this is another scenario which gives rise to the need for
> expressing relationships between lexical labels. E.g. if a concept has
> both preferred and alternative labels in multiple scripts, an application
> might want to display equivalent labels from different scripts beside each
> other to aid with reading.
> >
> > That's all I have for now.
> >
> > Cheers,
> >
> > Alistair.
> >
> > [1] http://en.wikipedia.org/wiki/Japanese_writing_system
> > [2] http://www.unicode.org/iso15924/iso15924-codes.html
> > [3] http://www.ietf.org/rfc/rfc3066.txt
> > [4]
> http://www.alvestrand.no/pipermail/ietf-languages/2004-March/001809.html
> > [5] http://www.w3.org/TR/2005/WD-swbp-skos-core-guide-20051102/#secmulti
> > --
> > Alistair Miles
> > Research Associate
> > CCLRC - Rutherford Appleton Laboratory
> > Building R1 Room 1.60
> > Fermi Avenue
> > Chilton
> > Didcot
> > Oxfordshire OX11 0QX
> > United Kingdom
> > Web: http://purl.org/net/aliman
> > Email: a.j.miles@rl.ac.uk
> > Tel: +44 (0)1235 445440
> >
> >
> >
>
> --
>
> Ivan Herman, W3C Semantic Web Activity Lead
> URL: http://www.w3.org/People/Ivan/
> PGP Key: http://www.cwi.nl/%7Eivan/AboutMe/pgpkey.html
> FOAF: http://www.ivan-herman.net/foaf.rdf
>
>


-- 
Sue Ellen Wright
Institute for Applied Linguistics
Kent State University
Kent OH 44242 USA
sellenwright@gmail.com
swright@kent.edu
sewright@neo.rr.com

Received on Tuesday, 6 February 2007 19:42:16 UTC