- From: Bernard Vatant <bernard.vatant@mondeca.com>
- Date: Mon, 18 Dec 2006 18:54:18 +0100
- To: public-esw-thes@w3.org, Thomas Baker <baker@sub.uni-goettingen.de>
Hello all Some thoughts about languages ... ISO-639 languages are used in XML and in RDF, and in SKOS, via their code used as value of xml:lang attribute. But for various applications, it would be interesting to define those languages as proper RDF resources. So far, the only attempt to do so I've found in RDF is http://downlode.org/rdf/iso-639/ and the description it provides is quite basic. In OASIS Geolang TC, we had defined quite a while ago so-called Published Subject Identifiers with similar basic description. http://psi.oasis-open.org/iso/639/ I've seen also that the current work on DCMI domains and ranges goes towards the definition of a "Language" class as the range of dc:language. See http://dublincore.org/usageboardwiki/PropertyDomainsAndRanges. I think this is a pretty good idea, and of course it would be good to have standard URIs and authoritative descriptions for instances of this class, based on ISO-639 codes Since I think we can wait for quite a while before ISO delivers such a thing in its own namespace - and I would be happy to be proven wrong here - I wonder what kind of initiative could move this thing forward. Is it in DCMI intention to define those instances in its own namespace (Tom, any clues on that?). In any case, I pretty much like the idea of having those languages defined as SKOS concepts in a language ConceptScheme, including the hierarchy of language family as defined by Ethnologue and re-used on Wikipedia. See e.g. http://www.ethnologue.com/show_language.asp?code=fra http://www.ethnologue.com/show_lang_family.asp?code=fra http://en.wikipedia.org/wiki/French_language We have in Ethnologue a bank of precious data about languages that could me mined into RDF, and in Wikipedia the translation of languages names in many other languages. On that latter point, discovering and maintaining translations of one language name in other languages is a N² issue. I've been through that for Mondeca internal use, and it's a headache even for 20 or so languages. 100 languages means 10,000 names ... and supposing any one of the 6,000+ languages identified by Ethnologue had a name in each other, we would come up with very big files indeed. So, we have public concepts, a lot of data to mine, we have use cases, all we need is a namespace to which append ISO 639 codes to forge URIs. Who is likely to host and maintain that namespace? http://www.w3.org/2004/02/skos/language# ? http://purl.org/dc/language/ ? Or maybe Ethnologue? I don't know how they feel about making their stuff available for the SW. Any contact with Ethnologue folks around? Thoughts welcome. -- *Bernard Vatant *Knowledge Engineering ---------------------------------------------------- *Mondeca** *3, cité Nollez 75018 Paris France Web: www.mondeca.com <http://www.mondeca.com> ---------------------------------------------------- Tel: +33 (0) 871 488 459 Mail: bernard.vatant@mondeca.com <mailto:bernard.vatant@mondeca.com> Blog: Leçons de Choses <http://mondeca.wordpress.com/>
Received on Monday, 18 December 2006 17:54:29 UTC