W3C home > Mailing lists > Public > public-esw-thes@w3.org > December 2006

Re: Could ISO-639 languages be defined as skos concepts?

From: Sue Ellen Wright <sellenwright@gmail.com>
Date: Fri, 22 Dec 2006 10:50:02 -0500
Message-ID: <e35499310612220750y32e5b526wa499adeb2fc8eb03@mail.gmail.com>
To: "Bernard Vatant" <bernard.vatant@mondeca.com>
Cc: "Felix Sasaki" <fsasaki@w3.org>, "Gerhard Budin" <gerhard.budin@univie.ac.at>, "Addison Phillips" <addison@yahoo-inc.com>, "Mark Davis" <mark.davis@jtcsv.com>, "Thomas Baker" <baker@sub.uni-goettingen.de>, public-esw-thes@w3.org
Hi, All,
I'm sure Felix will get back to us, but I know what he means about a finite
list. The RFC 4646 defines rules for generating language tags based on the
various code components that can be included. The potential for possible
combinations is huge, although, as the document points out, some
combinations are unrealistic or silly (Aleut as spoken in Belgium is a great
example.)
Bye for now
Sue Ellen


On 12/22/06, Bernard Vatant <bernard.vatant@mondeca.com> wrote:
>
> Hi Felix
>
> Thanks for jumping in.
>
> > I'm trying to understand what you want to achieve: Is it URIs for
> > language values, e.g. http://www.w3.org/2004/02/skos/language#en-US ?
> >
> Indeed. All the point is to identify and represent languages as
> concepts, in order to be able to make RDF assertions about them, beyond
> the "tag" use.
>
> > I don't think that it is feasible to have everything after "#" as an
> > URI, since RFC 4646 or its successor define a grammar for language tags.
> >
> Do you mean there is a technical issue forbidding to build valid URIs
> out of language tags?
> Not that although a single # namespace is the first idea which comes to
> mind it's not the only option.
> Could be as well http://www.w3.org/2004/02/skos/language/en/US or even
> an opaque URI http://www.w3.org/2004/02/skos/language#1234
> In any case subtag elements and other properties as revision date will
> be explicitly attached as properties. You can't rely on the URI string
> to carry semantics. This is a "Semantic Web Axiom" :-)
> > That is, you cannot have a finite set of URIs built out of that.
> >
> Sorry, I don't catch the point. What do you mean by a "finite set"?
> Could you expand on that?
> > Have you thought of registering an XPointer scheme at W3C? E.g.
> > something like "language()" which can be used e.g. in
> > http://www.w3.org/2004/02/skos/language#(en-US) . You would have to
> > define that the scheme data "()" contains an BCP 47 identifier.
> >
> I think I see what you have in mind, but remember RDF is not mainly
> about the structure of a published XML document, but about the semantics
> of URIs.
> Besides the language values themselves, and even before, we need a
> namespace for the ontology, the "Language" class, the different "subtag"
> properties etc.
> And defining a namespace is  more or less dependent of the vocabulary
> publication.
> See e.g. http://www.w3.org/TR/swbp-vocab-pub/
>
> Hope that helps, and that we don't speak cross each other.
>
> Regards
>
> Bernard
> > Felix
> >
> > Bernard Vatant wrote:
> >
> >> Sue Ellen
> >>
> >> Thanks for all this. I will munch over it and try to come up with
> >> something by the first week of January, when everybody is out of the
> >> bubbles ... :-)
> >>
> >> Bernard
> >>
> >> Sue Ellen Wright a écrit :
> >>
> >>> Hi, All,
> >>> Indeed, I suspect that lots of people would be delighted if someone
> >>> wants to go ahead with this for SKOS, provided that no one has already
> >>> started such a project. Rather than searching for IANA, you want to
> >>> reference IETF BCP 47, which will be your permanent ID reference for
> >>> the Language Tags. My contacts on BCP 47 are Felix Sasaki, Addison
> >>> Phillips, and Mark Davis, but as noted, they may possibly be off line
> >>> right now, as many people are. On the ISO side, Gerhard Budin is the
> >>> Chair of ISO TC 37/SC 2, whose WG 2 is responsible for the 639 family
> >>> of standards. I know that he shares my view that any new initiatives
> >>> in this area should be oriented toward the set of codes and the syntax
> >>> rules contained in the current IETF RFC 4645, 4646 and 4647, taking
> >>> into consideration any successor recommendations of the IETF. (There
> >>> is, for instance, a current effort to update the recently approved
> >>> RFCs to bring documents into compliance with the new ISO 639-3, which
> >>> essentially identifies the SIL Ethnologue codes as the extended codes
> >>> for comprehensive identification of languages. Also bear in mind (I
> >>> probably said this in another email) that when it comes to xml:lang,
> >>> we need to concern ourselves with langauge tags per IETF, not just
> >>> language codes alone.
> >>>
> >>> Sorry I'm not coming up with the absolute final answer here, but
> >>> sooner or later, one of the IETF guys will check his mail!
> >>> Best regards
> >>> Sue Ellen
> >>>
> >>>
> >>> On 12/21/06, *Bernard Vatant* <bernard.vatant@mondeca.com
> >>> <mailto:bernard.vatant@mondeca.com>> wrote:
> >>>
> >>>     Sue Ellen
> >>>     > I think you are absolutely right about this not being a
> significant
> >>>     > task: the main issue is to get a variety of people from a number
> of
> >>>     > communities of practice to agree on a single approach.
> >>>     Sure enough. But at least we could help proposing at least one.
> :-)
> >>>     > SKOS would certainly be one avenue. There may be others, and in
> the
> >>>     > end, we may need more than one flavor in order to conform to
> >>>     > requirements in a given environment, which is OK as long as we
> >>>     can map
> >>>     > successfully back and forth.
> >>>     Yes, this is a good use case for mapping, either SKOS-to-SKOS
> >>> mapping,
> >>>     or mapping from some RDF dialect to another. You know it's one of
> my
> >>>     favourite topics.
> >>>     > I'm hoping that sooner or later one of the guys for W3C will
> weigh
> >>>     > into this discussion and let us know whether they are already
> >>>     > addressing this issue.
> >>>     I've been searching the W3C I18n Activity
> >>>     http://www.w3.org/International/ which looks to me the place where
> >>>     such
> >>>     things should happen, but it looks like at first sight there is no
> >>>     connection between this activity and the SW activity. I will
> >>>     investigate
> >>>     further.
> >>>     > It's a bad time of year to hope to catch everybody monitoring
> their
> >>>     > email!
> >>>     Indeed. By the way, Happy Xmas to all :-)
> >>>
> >>>     Bernard
> >>>     > There will be an ISO TC 37 meeting in January where we'll be
> >>>     > addressing issues regarding our own metadata registry, and this
> >>> will
> >>>     > surely come up.
> >>>     > Best regards
> >>>     > Sue Ellen
> >>>     >
> >>>     > On 12/21/06, *Bernard Vatant* < bernard.vatant@mondeca.com
> >>>     <mailto:bernard.vatant@mondeca.com>
> >>>     > <mailto:bernard.vatant@mondeca.com
> >>>     <mailto:bernard.vatant@mondeca.com>>> wrote:
> >>>     >
> >>>     >     Hi Sue Ellen
> >>>     >
> >>>     >     Thanks for your insights. Do you have pointers to the
> >>>     discussions you
> >>>     >     mention, and/or any contact with people taking part in them,
> >>>     and who
> >>>     >     would see some interest in RDF-ization of  those resources?
> >>>     (assuming
> >>>     >     such a class definition is satisfiable).
> >>>     >     Actually when one looks at
> >>>     >     http://www.iana.org/assignments/language-subtag-registry
> >>>     >     < http://www.iana.org/assignments/language-subtag-registry>,
> >>> the
> >>>     >     technical
> >>>     >     task of migrating its content into RDF, as long as a
> relevant
> >>>     >     vocabulary
> >>>     >     is defined, is quite trivial.
> >>>     >     After that it's mainly a political issue. :-)
> >>>     >     But there is a point that has not been answered so far in my
> >>>     original
> >>>     >     question. Would SKOS a relevant format for such a
> >>>     representation?
> >>>     >
> >>>     >     Bernard
> >>>     >
> >>>     >
> >>>     >     Sue Ellen Wright a écrit :
> >>>     >     > Hi, All,
> >>>     >     > There's serious discussions going on concerning the IETF
> >>>     >     language tag
> >>>     >     > subtag registry and the ISO implementations of the 639
> >>>     family of
> >>>     >     > codes, so I think it makes sense to coordinate any efforts
> >>>     in this
> >>>     >     > direction with the folks working on those two sets of
> >>>     standards.
> >>>     >     IETF
> >>>     >     > 4647 spells out means for matching codes, but it would
> >>>     make things a
> >>>     >     > lot simpler if we have a more or less standard format for
> >>>     >     representing
> >>>     >     > them in rdf.
> >>>     >     > Bye for now
> >>>     >     > Sue Ellen
> >>>     >     >
> >>>     >     >
> >>>     >     > On 12/20/06, *Thomas Baker* <baker@sub.uni-goettingen.de
> >>>     <mailto:baker@sub.uni-goettingen.de>
> >>>     >     <mailto:baker@sub.uni-goettingen.de
> >>>     <mailto:baker@sub.uni-goettingen.de> >
> >>>     >     > <mailto:baker@sub.uni-goettingen.de
> >>>     <mailto:baker@sub.uni-goettingen.de>
> >>>     >     <mailto:baker@sub.uni-goettingen.de
> >>>     <mailto:baker@sub.uni-goettingen.de>> >> wrote:
> >>>     >     >
> >>>     >     >
> >>>     >     >     On Mon, Dec 18, 2006 at 06:54:18PM +0100, Bernard
> >>>     Vatant wrote:
> >>>     >     >     > ISO-639 languages are used in XML and in RDF, and in
> >>>     SKOS, via
> >>>     >     >     their
> >>>     >     >     > code used as value of xml:lang attribute.
> >>>     >     >     > But for various applications, it would be
> >>> interesting to
> >>>     >     define
> >>>     >     >     those
> >>>     >     >     > languages as proper RDF resources.
> >>>     >     >     >
> >>>     >     >     > So far, the only attempt to do so I've found in RDF
> is
> >>>     >     >     > http://downlode.org/rdf/iso-639/ and the description
> it
> >>>     >     provides is
> >>>     >     >     > quite basic.
> >>>     >     >     ...
> >>>     >     >
> >>>     >     >     > So, we have public concepts, a lot of data to mine,
> we
> >>>     >     have use
> >>>     >     >     cases,
> >>>     >     >     > all we need is a namespace to which append ISO 639
> >>>     codes to
> >>>     >     >     forge URIs.
> >>>     >     >     > Who is likely to host and maintain that namespace?
> >>>     >     >     > http://www.w3.org/2004/02/skos/language#
> >>>     >     >     <http://www.w3.org/2004/02/skos/language#>  ?
> >>>     >     >     > http://purl.org/dc/language/
> >>>     <http://purl.org/dc/language/>  ?
> >>>     >     >     ...
> >>>     >     >     > Since I think we can wait for quite a while before
> ISO
> >>>     >     delivers
> >>>     >     >     such a
> >>>     >     >     > thing in its own namespace - and I would be happy to
> >>>     be proven
> >>>     >     >     wrong
> >>>     >     >     > here - I wonder what kind of initiative could move
> >>>     this thing
> >>>     >     >     forward.
> >>>     >     >     > Is it in DCMI intention to define those instances in
> >>>     its own
> >>>     >     >     namespace
> >>>     >     >     > (Tom, any clues on that?).
> >>>     >     >
> >>>     >     >     Well, I agree with the need :-)
> >>>     >     >
> >>>     >     >     Several years ago, we considered opening a DCMI
> >>>     service for the
> >>>     >     >     "registration" of URIs identifying controlled
> >>>     vocabularies for
> >>>     >     >     use as encoding schemes in metadata.  While the demand
> >>>     for such
> >>>     >     >     a service was clear, the project did not look
> >>>     maintainable,
> >>>     >     >     sustainable, and scalable.
> >>>     >     >
> >>>     >     >     Unless URIs are coined "once and for all" and "with no
> >>>     >     >     guarantees" (and how useful is that?), it is not clear
> >>>     >     >     how such a namespace host should operate over
> time.  The
> >>>     >     >     impulse to "just do it" comes up against hard
> questions.
> >>>     >     >     Even just maintaining URIs for entities in a
> separately
> >>>     >     >     maintained ISO standard would involve a significant
> >>>     commitment.
> >>>     >     >
> >>>     >     >     Tom
> >>>     >     >
> >>>     >     >     --
> >>>     >     >     Tom Baker - tbaker@tbaker.de <mailto:tbaker@tbaker.de>
> >>>     <mailto:tbaker@tbaker.de <mailto:tbaker@tbaker.de>>
> >>>     >     <mailto:tbaker@tbaker.de <mailto:tbaker@tbaker.de>
> >>>     <mailto:tbaker@tbaker.de <mailto:tbaker@tbaker.de>>> -
> >>>     >     >     baker@sub.uni-goettingen.de
> >>>     <mailto:baker@sub.uni-goettingen.de>
> >>>     >     <mailto:baker@sub.uni-goettingen.de
> >>>     <mailto:baker@sub.uni-goettingen.de>> <mailto:
> >>>     >     baker@sub.uni-goettingen.de
> >>>     <mailto:baker@sub.uni-goettingen.de>
> >>>     <mailto:baker@sub.uni-goettingen.de
> >>>     <mailto:baker@sub.uni-goettingen.de>>>
> >>>     >
> >>>     >
> >>>     > <mailto:sewright@neo.rr.com <mailto:sewright@neo.rr.com>>
> >>>
> >>>
> >>>     < http://mondeca.wordpress.com/>
> >>>
> >>>
> >>>
> >>>
> >>> --
> >>> Sue Ellen Wright
> >>> Institute for Applied Linguistics
> >>> Kent State University
> >>> Kent OH 44242 USA
> >>> sellenwright@gmail.com <mailto:sellenwright@gmail.com>
> >>> swright@kent.edu <mailto:swright@kent.edu>
> >>> sewright@neo.rr.com <mailto:sewright@neo.rr.com>
> >>>
> ------------------------------------------------------------------------
> >>>
> >>> No virus found in this incoming message.
> >>> Checked by AVG Free Edition.
> >>> Version: 7.5.432 / Virus Database: 268.15.26/594 - Release Date:
> >>> 20/12/2006 15:54
> >>>
> >>>
> >
> >
> >
> >
>
> --
>
> *Bernard Vatant
> *Knowledge Engineering
> ----------------------------------------------------
> *Mondeca**
> *3, cité Nollez 75018 Paris France
> Web:    www.mondeca.com <http://www.mondeca.com>
> ----------------------------------------------------
> Tel:       +33 (0) 871 488 459
> Mail:     bernard.vatant@mondeca.com <mailto:bernard.vatant@mondeca.com>
> Blog:    Leçons de Choses <http://mondeca.wordpress.com/>
>
>


-- 
Sue Ellen Wright
Institute for Applied Linguistics
Kent State University
Kent OH 44242 USA
sellenwright@gmail.com
swright@kent.edu
sewright@neo.rr.com
Received on Friday, 22 December 2006 15:50:18 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 7 December 2009 10:38:55 GMT