Re: Could ISO-639 languages be defined as skos concepts?

Dear Colleagues,
I do think that there would be good reasons to express *languages tags* as
RDF, but you will notice that I have written "language tags", not
"languages." I think it would be most productive to provide for an RDF
expression of the elements in the Language Tag Subregistry in IEFT 4645
rather than working through all the potential duplications and ambiguities
of the full 639 (Parts 1-3) family of standards. I suggest that you check
out IETF 4645 and 4646 before launching any efforts to create this resource.
Felix, Mark, and Addison, has this possibly already been done anywhere? The
reason I cite the registry is that there are redundancies and anomalies,
particularly with respect to ISO 639-2 which are best avoided in creating
any new resource. Furthermore, it is the language tags, and not just
language code elements, that are specified for use in xml. Actually, it has
been our intention to include the language tags in the TC 37 Data Category
Registry, which it itself an RDF resource. I'm not so sure that the language
tags are there, however; probably just the language code elements.
Best regards
Sue Ellen Wright

On 12/18/06, Bernard Vatant <bernard.vatant@mondeca.com> wrote:
>
>
> Hello all
>
> Some thoughts about languages ...
>
> ISO-639 languages are used in XML and in RDF, and in SKOS, via their
> code used as value of xml:lang attribute.
> But for various applications, it would be interesting to define those
> languages as proper RDF resources.
>
> So far, the only attempt to do so I've found in RDF is
> http://downlode.org/rdf/iso-639/ and the description it provides is
> quite basic.
> In OASIS Geolang TC, we had defined quite a while ago so-called
> Published Subject Identifiers with similar basic description.
> http://psi.oasis-open.org/iso/639/
> I've seen also that the current work on DCMI domains and ranges goes
> towards the definition of a "Language" class as the range of dc:language.
> See http://dublincore.org/usageboardwiki/PropertyDomainsAndRanges.
> I think this is a pretty good idea, and of course it would be good to
> have standard URIs and authoritative descriptions for instances of this
> class, based on ISO-639 codes
> Since I think we can wait for quite a while before ISO delivers such a
> thing in its own namespace - and I would be happy to be proven wrong
> here - I wonder what kind of initiative could move this thing forward.
> Is it in DCMI intention to define those instances in its own namespace
> (Tom, any clues on that?).
>
> In any case, I pretty much like the idea of having those languages
> defined as SKOS concepts in a language ConceptScheme, including the
> hierarchy of language family as defined by Ethnologue and re-used on
> Wikipedia. See e.g.
> http://www.ethnologue.com/show_language.asp?code=fra
> http://www.ethnologue.com/show_lang_family.asp?code=fra
> http://en.wikipedia.org/wiki/French_language
>
> We have in Ethnologue a bank of precious data about languages that could
> me mined into RDF, and in Wikipedia the translation of languages names
> in many other languages. On that latter point, discovering and
> maintaining translations of one language name in other languages is a N²
> issue. I've been through that for Mondeca internal use, and it's a
> headache even for 20 or so languages. 100 languages means 10,000 names
> ... and supposing any one of the 6,000+ languages identified by
> Ethnologue had a name in each other, we would come up with very big
> files indeed.
>
> So, we have public concepts, a lot of data to mine, we have use cases,
> all we need is a namespace to which append ISO 639 codes to forge URIs.
> Who is likely to host and maintain that namespace?
> http://www.w3.org/2004/02/skos/language#  ?
> http://purl.org/dc/language/  ?
>
> Or maybe Ethnologue? I don't know how they feel about making their stuff
> available for the SW. Any contact with Ethnologue folks around?
>
> Thoughts welcome.
>
> --
>
> *Bernard Vatant
> *Knowledge Engineering
> ----------------------------------------------------
> *Mondeca**
> *3, cité Nollez 75018 Paris France
> Web:    www.mondeca.com <http://www.mondeca.com>
> ----------------------------------------------------
> Tel:       +33 (0) 871 488 459
> Mail:     bernard.vatant@mondeca.com <mailto:bernard.vatant@mondeca.com>
> Blog:    Leçons de Choses <http://mondeca.wordpress.com/>
>
>
>


-- 
Sue Ellen Wright
Institute for Applied Linguistics
Kent State University
Kent OH 44242 USA
sellenwright@gmail.com
swright@kent.edu
sewright@neo.rr.com

Received on Tuesday, 19 December 2006 20:19:52 UTC