- From: Christian Chiarcos <christian.chiarcos@web.de>
- Date: Mon, 26 Nov 2018 00:08:07 +0100
- To: Andy Seaborne <andy@seaborne.org>
- Cc: Hugh Glaser <hugh@glasers.org>, SW-forum <semantic-web@w3.org>
- Message-ID: <CAC1YGdgBJ-B+R5Lpw1jtYpzAAJ2qgPKuMvpYJN+1Es8AKRf0pw@mail.gmail.com>
Am So., 25. Nov. 2018 um 21:21 Uhr schrieb Andy Seaborne <andy@seaborne.org >: > Hugh, Christian, > > You can do what you describe already : see for example SKOS-XL which > also discusses some issues. > > The web already has language tags RFC5646 and I guess that is how they > ended up in RDF via xml:lang and HTML. We should work with and use the > outputs of these communities, not redo their work. > We should, and I was not suggesting anything else. But there is absolutely no reason not to explicate the meaning of RFC5646 (= BCP47) codes with proper RDF semantics (instead of treating them as as unstructured strings) *if that solves an acceptability problem*. I think it would, if this is being used to simplify string comparison. With the current notation reconsidered as a shorthand for a URI-based representation (which includes ISO 693 codes as required by BCP47), and BCP47 remaining the preferred way to identify languages (certainly, because it's compact), we would still gain a lot. Not just a more transparent and more flexible matching between strings with different, but overlapping (or without) BCP47 codes, but also the capability to provide language codes for language varieties that BCP47 simply doesn't support. And this is not an ad hoc extension, nor anything that needs to be re-done, but it is re-using the output of existing communities/term bases for circumstances/language varieties/speaker communities for which BCP47 just fails. (No blame put on BCP47, they inherit their limitations from ISO639: Both ISO639-2 and ISO639-3 have a selection bias; ISO639-6 would have covered language varieties down to the level of dialects, but it was withdrawn in 2014.*) * To be fair, BCP47 supports variant subtags and a registration process for these, but replicating existing term bases of language identifiers such as glottolog.org within the IANA Language Subtag Registry would really mean to redo a lot of work. At the moment, Glottolog provides URIs and documentation for 8,481 language varieties. The IANA Language Subtag Registry has only 97 non-redundant, non-deprecated "variant" sub-tags. skosxl:Label does provide a class for strings, which is basically what I was suggesting, indeed -- but notationally, this is way too verbose to be appealing to RDF novices. Hence the suggestion to change the interpretation of the regular string notation (and the criteria for string identity). BTW: Triplifying language metadata is not the only way to implement this, of course. A great alternative would also just be a list of language, region, script identifiers, etc., in the same order and exactly as defined in BCP47. But (internally) a list of URIs, not strings, and with the capability of being extensible with "foreign" URIs. Best, Christian -- Prof. Dr. Christian Chiarcos Applied Computational Linguistics Johann Wolfgang Goethe Universität Frankfurt a. M. 60054 Frankfurt am Main, Germany office: Robert-Mayer-Str. 10, #401b mail: chiarcos@informatik.uni-frankfurt.de web: http://acoli.cs.uni-frankfurt.de tel: +49-(0)69-798-22463 fax: +49-(0)69-798-28931
Received on Sunday, 25 November 2018 23:08:40 UTC