- From: Bernard Vatant <bernard.vatant@mondeca.com>
- Date: Fri, 17 Feb 2012 12:10:18 +0100
- To: "M. Scott Marshall" <mscottmarshall@gmail.com>
- Cc: Gerard de Melo <gdemelo@mpi-inf.mpg.de>, Lars Marius Garshol <lars.garshol@bouvet.no>, Barry Norton <barry.norton@ontotext.com>, public-lod@w3.org
- Message-ID: <CAK4ZFVEJGKCet004C-mYeheFsiEOaPKjFbK5Kvj53U-163KNLQ@mail.gmail.com>
Hi all I wanted to answer Gerard yesterday but some parts of the answer have already been addressed by Lars Marius, whom I'm happy to read here, and even happier to see we agree, given taht in the past we sometimes agreed to disagree on those tricky issues of identification and URIs (... and no less tricky subject of quality of beer, although for that one I have to bow in respect to his authority :) I think there are two aspects which are to be kept distinct for references as important as languages : the stability of identifiers, and the quality of descriptions available for those URIs. And a third one is what is identified ... For the first point I guess if LoC is not able to ensure stable URIs inside its DNS, who will? And both from a social (trust) point of view and technical one, I prefer to have URIs in the id.loc.gov namespace than in some more or less opaque purl one. For example all fundamental W3C spec at the basis of all the RDF ecosystem are in the w3.org DNS, and the W3C has a policy of URI stability which IMO can be adopted by LoC. Now for data quality. With all due respect to the amazing work of lexvo.org, I think Gerard's argument about ISO 639-3 being "better" than ISO 639-2 is off topic here. This is to be discussed inside ISO 639 committees :) The point is that we have 639-1 and 639-2 and 639-3 and now 639-5 and it's a mess, OK, but that's legacy our systems have to cope with. What do we need in linked data land? A minima an exact mirror of those codes in the form of stable URIs, as close as possible of the source authority for those codes, and built in such a way that both publication authority and matching with the ISO normative source are absolutely non-ambiguous. Seems to me that http://id.loc.gov/vocabulary/iso639-2/grc provides exactly this. Of course one can ask why LoC does not publish (yet) also URIs for 639-3, but hopefully it's in the pipes, as well as countries ISO-3166 as Lars Marius points (those were also in the original OASIS Published Subjects publication ...). But id.loc.gov have 639-5 entries. That other data sets will provide better or more complete information about things identified by those URIs is not a problem. I think it's OK if a reference URI provides just the minimal description needed for disambiguation and context, and basis for maximal re-use. To take a completely different example, what is the most reused URI in the LOD, beyond the URI in standards themselves RDF, RDFS, OWL? Certainly http://xmlns.com/foaf/0.1/Person. What does FOAF itself provide about this class? Not much. But the fact that millions of triples use it make it a reference, both at vocabulary and data level, can help to figure what a foaf:Person can be. For example go to http://labs.mondeca.com/endpoint/lov_aggregator and run the proposed default query ... The referent "in the real world" of http://id.loc.gov/vocabulary/iso639-2/grc is as fuzzy as the referent of http://xmlns.com/foaf/0.1/Person. It's indeed a conceptualization of a language, which has been defined by ISO 639-2 standard according to criteria most people won't argue about, and some will disagree upon for good reasons. And that's why we have 639-3. As any classification of languages, this one defines arbitrary limits in a continuum. What is a language limit in the real world is and will ever be an open question. But information systems simply rely on codes provided by an authority to which they defer the tricky task of deciding about it. So, when you say your publication is written in French, yes you refer to a certain concept of French when using a URI based on an ISO code, and I've no problem with that at all. When you use xml:lang="fr" what it refers to exactly in the real complex world of languages I can't say, but all systems using it consider it's the same, and by BCP 47 it's French as defined by ISO 639-1. Best regards Bernard 2012/2/17 M. Scott Marshall <mscottmarshall@gmail.com> > Hi Bernard, Gerard, (and now Lars), > > Thanks for the pointers. It seems like we are better off pointing > directly to lexvo if we want URIs that will > > 1) enable us to precisely and unambiguously refer to any official > language (including, for example, Cantonese) > > 2) provide the name of the language in many languages (potentially > useful for search indexes and labels in applications). > > However, there is a URI longevity issue whenever PURLs are not used > (see full explanation of issues at http://sharedname.org ). Providing > a neutral namespace that can be redirected when domain names change is > the most effective way to create a persistent URI that won't contain > historical artifacts when the 'name brand'-based domain name changes > (as has been repeatedly demonstrated by history). So, ideally, an > organization with long-term governance (not project bound) would > maintain a namespace such as http://sharedname.org/lang/ that could be > redirected from lexvo to future-lexvo domains/URLs. > > [Lars - your message came in just as I was about to press <send>. I'm > confused by your reply. What about the problems with LOC lang ids that > Gerard pointed out? Is that what you meant by "If only they could do > ISO 3166 countries as well..."?] > > Best, > Scott > > On Thu, Feb 16, 2012 at 8:21 PM, Gerard de Melo <gdemelo@mpi-inf.mpg.de> > wrote: > > Hi Bernard, > > > > > > I think now we should forget about URIs published by pionneer projects > such > > as OASIS TC, lingvoj.org and lexvo.org, and stick to URIs published by > > genuine authority Library of Congress which is as close to the primary > > source as can be. So if you want to use a URI for Ancient Greek as > defined > > by ISO 639-2, please use http://id.loc.gov/vocabulary/iso639-2/grc. > > > > BTW Lars Marius, hello, what do you think? URIs at id.loc.gov are really > > what we were dreaming to achieve in 2001, right? > > > > > > Now of course I may be a bit biased here, but I do not believe that the > > id.loc.gov service solves > > all of the problems. This is from the Lexvo.org FAQ [1]: > > > > The advantage of using those URIs is that they are maintained by the > Library > > of Congress. However, there are also several issues to consider. First of > > all, ISO 639-2 is orders of magnitude smaller than ISO 639-3 and for > example > > lacks an adequate code for Cantonese, which is spoken by over 60 million > > speakers. > > More importantly, the LOC's URIs do not describe languages per se but > rather > > describe code-mediated conceptualizations of languages. This implies, for > > instance, that the French language (<http://lexvo.org/id/iso639-3/fra>) > has > > two different counterparts at the LOC, > > <http://id.loc.gov/vocabulary/iso639-2/fra> and > > <http://id.loc.gov/vocabulary/iso639-2/fre>, which each have slightly > > different properties. > > Finally, connecting your data to Lexvo.org's information is likely to be > > more useful in practical applications. It offers information about the > > languages themselves, e.g. where they are spoken, while the LOC mostly > > provides information about the codes, e.g. when the codes were created > and > > updated and what kind of code they are. > > In practice, you can also use both codes simultaneously in your data. > > However, you need to be very careful to make sure that you are asserting > > that a publication is written in French rather than in some concept of > > French created on January, 1, 1970 in the United States. > > > > > > Best, > > Gerard > > > > [1] http://www.lexvo.org/linkeddata/faq.html > > > > -- > > Gerard de Melo [demelo@icsi.berkeley.edu] > > http://www.icsi.berkeley.edu/~demelo/ > -- *Bernard Vatant * Vocabularies & Data Engineering Tel : + 33 (0)9 71 48 84 59 Skype : bernard.vatant Linked Open Vocabularies <http://labs.mondeca.com/dataset/lov> -------------------------------------------------------- *Mondeca** ** * 3 cité Nollez 75018 Paris, France www.mondeca.com Follow us on Twitter : @mondecanews <http://twitter.com/#%21/mondecanews>
Received on Friday, 17 February 2012 11:11:11 UTC