W3C home > Mailing lists > Public > public-swd-wg@w3.org > February 2007

Re: [SKOS] languages and scripts

From: Sue Ellen Wright <sellenwright@gmail.com>
Date: Tue, 6 Feb 2007 14:42:05 -0500
Message-ID: <e35499310702061142q71a0ff84uc054479b0ba40571@mail.gmail.com>
To: "Ivan Herman" <ivan@w3.org>
Cc: "Miles, AJ (Alistair)" <A.J.Miles@rl.ac.uk>, public-swd-wg@w3.org, public-esw-thes@w3.org, nagamori@slis.tsukuba.ac.jp
Hi, All,
Do look at IETF 4646, which is the new replacement for 3066. It spells out
precisely how to report script codes as part of the language tag. I strongly
recommend the incorporation of 4646 into all apps involving multilingual
tagging.
Bye for now
Sue Ellen Wright


On 2/6/07, Ivan Herman <ivan@w3.org> wrote:
>
> Alistair,
>
> http://www.w3.org/International/articles/language-tags/Overview.en.php
>
> might be a good place to find information.
>
> http://www.w3.org/International/resource-index.php?topic=lang
>
> gives a bunch of other references, some of them may be relevant...
>
> Ivan
>
> Miles, AJ (Alistair) wrote:
> > Hi all,
> >
> > Just jotting down some notes prior to raising an issue ...
> >
> > At DC2006 I spoke to Mitsuharu Nagamori from the University of Tsukuba
> (cced) about a SKOS encoding of the Japanese National Library classification
> scheme. Mitsuharu and I discussed design options for representing the
> features of the classification scheme. Mitsuharu also taught me about the
> various scripts that are used for the Japanese written language. I am still
> very ignorant about the Japanese language so please forgive me if I make any
> errors in this email.
> >
> > As I understand it, there are several different scripts available for
> writing Japanese [1]. These are the Kanji script (characters of Chinese
> origin), the Hiragana script (a syllabary), the Katakana script (also a
> syllabary) and the Latin alphabet.
> >
> > In the JNL classification scheme, all four scripts may be used.
> >
> > The general situation in which a concept may be labelled using multiple
> scripts within the same language gives rise to a number of potential issues.
> >
> > Firstly, an application may wish to distinguish between labels in
> different scripts, for display purposes. How is the script of a label to be
> represented in an RDF graph?
> >
> > I found a standard list of script names [2], I believe for Japanese the
> values are as follows ..
> >
> >  * Hani (Kanji)
> >  * Hira (Hiragana)
> >  * Kana (Katakana)
> >  * Latn (Latin)
> >
> > I then had a look at RFC 3066 [3] to see if the script names could be
> used within language tags. To paraphrase, [3] says that a language tag can
> be built up from any number of subtags separated by "-" character. If I've
> understood it correctly, the first subtag is supposed to be the language
> code (from ISO 639-1 or ISO 639-2 e.g. "en"), the second subtag is
> supposed to be a country code (from ISO 3166), and the third subtag can be
> anything you want. So e.g. you can have "sgn-US-MA" for Martha's Vineyard
> Sign Language, which is found in the state of Massachusetts, US.
> >
> > So can you have e.g. "ja-JP-Kana" for japanese - Japan - Katakana
> script?
> >
> > Then I found this email from Jeremy Carroll [4] that suggests you can
> put the script name and the country code the other way around, e.g.
> "zh-hant-TW". Does anyone know what the rules are for including script names
> in language tags, and where this is specified?
> >
> > If it is possible to embed script names in language tags, then the
> representational issue can be resolved.
> >
> > Secondly, this bears on the cardinality of the skos:prefLabel property.
> The SKOS Core Guide [5] currently says, "A concept should have no more than
> one preferred lexical label per language." However, when working with
> multiple scripts a concept scheme would need one preferred lexical label per
> language per script.
> >
> > Thirdly, this is another scenario which gives rise to the need for
> expressing relationships between lexical labels. E.g. if a concept has
> both preferred and alternative labels in multiple scripts, an application
> might want to display equivalent labels from different scripts beside each
> other to aid with reading.
> >
> > That's all I have for now.
> >
> > Cheers,
> >
> > Alistair.
> >
> > [1] http://en.wikipedia.org/wiki/Japanese_writing_system
> > [2] http://www.unicode.org/iso15924/iso15924-codes.html
> > [3] http://www.ietf.org/rfc/rfc3066.txt
> > [4]
> http://www.alvestrand.no/pipermail/ietf-languages/2004-March/001809.html
> > [5] http://www.w3.org/TR/2005/WD-swbp-skos-core-guide-20051102/#secmulti
> > --
> > Alistair Miles
> > Research Associate
> > CCLRC - Rutherford Appleton Laboratory
> > Building R1 Room 1.60
> > Fermi Avenue
> > Chilton
> > Didcot
> > Oxfordshire OX11 0QX
> > United Kingdom
> > Web: http://purl.org/net/aliman
> > Email: a.j.miles@rl.ac.uk
> > Tel: +44 (0)1235 445440
> >
> >
> >
>
> --
>
> Ivan Herman, W3C Semantic Web Activity Lead
> URL: http://www.w3.org/People/Ivan/
> PGP Key: http://www.cwi.nl/%7Eivan/AboutMe/pgpkey.html
> FOAF: http://www.ivan-herman.net/foaf.rdf
>
>


-- 
Sue Ellen Wright
Institute for Applied Linguistics
Kent State University
Kent OH 44242 USA
sellenwright@gmail.com
swright@kent.edu
sewright@neo.rr.com
Received on Tuesday, 6 February 2007 19:42:16 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 8 January 2008 14:17:28 GMT