W3C home > Mailing lists > Public > public-esw-thes@w3.org > December 2006

Could ISO-639 languages be defined as skos concepts?

From: Bernard Vatant <bernard.vatant@mondeca.com>
Date: Mon, 18 Dec 2006 18:54:18 +0100
Message-ID: <4586D5CA.2050701@mondeca.com>
To: public-esw-thes@w3.org, Thomas Baker <baker@sub.uni-goettingen.de>

Hello all

Some thoughts about languages ...

ISO-639 languages are used in XML and in RDF, and in SKOS, via their 
code used as value of xml:lang attribute.
But for various applications, it would be interesting to define those 
languages as proper RDF resources.

So far, the only attempt to do so I've found in RDF is 
http://downlode.org/rdf/iso-639/ and the description it provides is 
quite basic.
In OASIS Geolang TC, we had defined quite a while ago so-called 
Published Subject Identifiers with similar basic description.
I've seen also that the current work on DCMI domains and ranges goes 
towards the definition of a "Language" class as the range of dc:language.
See http://dublincore.org/usageboardwiki/PropertyDomainsAndRanges.
I think this is a pretty good idea, and of course it would be good to 
have standard URIs and authoritative descriptions for instances of this 
class, based on ISO-639 codes
Since I think we can wait for quite a while before ISO delivers such a 
thing in its own namespace - and I would be happy to be proven wrong 
here - I wonder what kind of initiative could move this thing forward. 
Is it in DCMI intention to define those instances in its own namespace 
(Tom, any clues on that?).

In any case, I pretty much like the idea of having those languages 
defined as SKOS concepts in a language ConceptScheme, including the 
hierarchy of language family as defined by Ethnologue and re-used on 
Wikipedia. See e.g.

We have in Ethnologue a bank of precious data about languages that could 
me mined into RDF, and in Wikipedia the translation of languages names 
in many other languages. On that latter point, discovering and 
maintaining translations of one language name in other languages is a N▓ 
issue. I've been through that for Mondeca internal use, and it's a 
headache even for 20 or so languages. 100 languages means 10,000 names 
... and supposing any one of the 6,000+ languages identified by 
Ethnologue had a name in each other, we would come up with very big 
files indeed.

So, we have public concepts, a lot of data to mine, we have use cases, 
all we need is a namespace to which append ISO 639 codes to forge URIs.
Who is likely to host and maintain that namespace?
http://www.w3.org/2004/02/skos/language#  ?
http://purl.org/dc/language/  ?

Or maybe Ethnologue? I don't know how they feel about making their stuff 
available for the SW. Any contact with Ethnologue folks around?

Thoughts welcome.


*Bernard Vatant
*Knowledge Engineering
*3, citÚ Nollez 75018 Paris France
Web:    www.mondeca.com <http://www.mondeca.com>
Tel:       +33 (0) 871 488 459
Mail:     bernard.vatant@mondeca.com <mailto:bernard.vatant@mondeca.com>
Blog:    Lešons de Choses <http://mondeca.wordpress.com/>
Received on Monday, 18 December 2006 17:54:29 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 19:45:38 UTC