Re: "Language-tagged strings Re: Toward easier RDF: a proposal"

Am So., 25. Nov. 2018 um 21:21 Uhr schrieb Andy Seaborne <andy@seaborne.org
>:

> Hugh, Christian,
>
> You can do what you describe already : see for example SKOS-XL which
> also discusses some issues.
>
> The web already has language tags RFC5646 and I guess that is how they
> ended up in RDF via xml:lang and HTML. We should work with and use the
> outputs of these communities, not redo their work.
>

We should, and I was not suggesting anything else. But there is absolutely
no reason not to explicate the meaning of RFC5646 (= BCP47) codes with
proper RDF semantics (instead of treating them as as unstructured strings)
*if that solves an acceptability problem*. I think it would, if this is
being used to simplify string comparison. With the current notation
reconsidered as a shorthand for a URI-based representation (which includes
ISO 693 codes as required by BCP47), and BCP47 remaining the preferred way
to identify languages (certainly, because it's compact), we would still
gain a lot. Not just a more transparent and more flexible matching between
strings with different, but overlapping (or without) BCP47 codes, but also
the capability to provide language codes for language varieties that BCP47
simply doesn't support. And this is not an ad hoc extension, nor anything
that needs to be re-done, but it is re-using the output of existing
communities/term bases for circumstances/language varieties/speaker
communities for which BCP47 just fails. (No blame put on BCP47, they
inherit their limitations from ISO639: Both ISO639-2 and ISO639-3 have a
selection bias; ISO639-6 would have covered language varieties down to the
level of dialects, but it was withdrawn in 2014.*)

* To be fair, BCP47 supports variant subtags and a registration process for
these, but replicating existing term bases of language identifiers such as
glottolog.org within the IANA Language Subtag Registry would really mean to
redo a lot of work. At the moment, Glottolog provides URIs and
documentation for 8,481 language varieties. The IANA Language Subtag
Registry has only 97 non-redundant, non-deprecated "variant" sub-tags.

skosxl:Label does provide a class for strings, which is basically what I
was suggesting, indeed -- but notationally, this is way too verbose to be
appealing to RDF novices. Hence the suggestion to change the interpretation
of the regular string notation (and the criteria for string identity).
BTW: Triplifying language metadata is not the only way to implement this,
of course. A great alternative would also just be a list of language,
region, script identifiers, etc., in the same order and exactly as defined
in BCP47. But (internally) a list of URIs, not strings, and with the
capability of being extensible with "foreign" URIs.

Best,
Christian
-- 
Prof. Dr. Christian Chiarcos
Applied Computational Linguistics
Johann Wolfgang Goethe Universität Frankfurt a. M.
60054 Frankfurt am Main, Germany

office: Robert-Mayer-Str. 10, #401b
mail: chiarcos@informatik.uni-frankfurt.de
web: http://acoli.cs.uni-frankfurt.de
tel: +49-(0)69-798-22463
fax: +49-(0)69-798-28931

Received on Sunday, 25 November 2018 23:08:40 UTC