Re: Incorrect lang tags Re: Princeton WordNet RDF

Perhaps you want to take my suggestion for handling codes ...

http://lists.w3.org/Archives/Public/public-lod/2014Apr/0105.html

The codes are a shorthand for links and labels.  By using a lookup table with 1461 (possibly duplicate entries) you can create a map (of synthetic bi-annual versions) which will keep the labels in sync with the permalinks, and update automatically.  The problem is that if you pare down the valid code list on a per application basis the abilities of the applications diverge.

For example, ET (upper case) is the Country Code for Ethiopia (http://id.loc.gov/vocabulary/countries/et) and et (lower case) is the ISO-639-1 code for Estonian (http://id.loc.gov/vocabulary/iso639-1/et).  There is no semantics involved, just a lot of confusion involved when everybody knows their favorite 10 codes of each.

There is an initial problem of using a single code set to begin with, but going forward, it would be worth the effort to fix inconsistancies.  RDF can be quite a mess without some forethought.

--Gannon
--------------------------------------------
On Wed, 4/16/14, Bernard Vatant <bernard.vatant@mondeca.com> wrote:

 Subject: Incorrect lang tags Re: Princeton WordNet RDF
 To: "John P. McCrae" <jmccrae@cit-ec.uni-bielefeld.de>, "Linking Open Data" <public-lod@w3.org>
 Date: Wednesday, April 16, 2014, 4:56 PM
 
 John
 
 Looking at the data in more details, it appears that
 the lang tags are using systematically ISO 639-2 codes (3
 letters-code), even when the ISO 639-1 exists and should be
 used, as per BCP
 47.
 
 
 See e.g., http://www.w3.org/RDF/Validator/rdfval?URI=http%3A%2F%2Fwordnet-rdf.princeton.edu%2Fwn31%2F109637345-n.rdf
 
 
 The W3C validator is right except when not up-to-date
 with the last ISO 639 values like in : 
 Error: {W116} ISO-639 does not define language:
 'zsm'.[Line = 53, Column = 50]
 
 Nope, there is such a code in ISO 639-3 :)
 
 
 See http://www.lingvoj.org/languages/tag-zsm.html
 
 and source http://www-01.sil.org/iso639-3/documentation.asp?id=zsm
 
 
 
 Hope you can fix this
 easily!
 
 Bernard
 
 2014-04-16 15:30
 GMT+02:00 John P. McCrae <jmccrae@cit-ec.uni-bielefeld.de>:
 
 
 Princeton
 University in collaboration with the Cognitive Interaction
 Technology
 Excellence Center of Bielefeld University are proud to
 announce the first
 
 
 RDF version of WordNet 3.1, now available at:
 
 
  http://wordnet-rdf.princeton.edu/
 
 This version, based on the current development of the
 WordNet project,
 intends to be a nucleus for the Linguistic Linked Open Data
 cloud and the global
 
 
 
 WordNet projects. The data are accessible in five formats
 (HTML+RDFa, RDF/XML,
 Turtle, N-Triples and JSON-LD) as well as by querying a
 SPARQL endpoint.
 The model is itself based on the lemon model and
 follows the guidelines 
 
 
 
 of the W3C OntoLex Community Group. 
 
 We have incorporated direct links to the previous W3C
 WordNets, UBY, Lexvo.org, VerbNet as well as translations
 collected
 by the Open Multilingual WordNet Project. Furthermore, we
 include links
 
 
 
 within the resource for previous versions of WordNets to
 further enable
 linking. We are interested in incorporating any resources
 that are linked to
 WordNet and would greatly appreciate suggestions.
 
 Regards,
 
 
 
 John P. McCrae, Christiane Fellbaum & Philipp
 Cimiano
 
 
 
 -- 
 Bernard Vatant
 Vocabularies & Data Engineering
 
 
 Tel
 :  +
 33 (0)9 71 48 84 59
 
 
 Skype
 : bernard.vatant
 http://google.com/+BernardVatant
 --------------------------------------------------------
 
 Mondeca       
        
              
 
 
 
 35 boulevard de Strasbourg 75010 Paris
 www.mondeca.comFollow
 us on Twitter : @mondecanews
 
 
 ----------------------------------------------------------
 
 
 
 
 
 

Received on Thursday, 17 April 2014 23:05:55 UTC