Re: Question: replacing language codes in a SPARQL BIND statement? from Felix Sasaki on 2016-03-14 (public-ontolex@w3.org from March 2016)

From: Felix Sasaki <fsasaki@w3.org>
Date: Mon, 14 Mar 2016 22:37:50 +0100
To: Christian Chiarcos <chiarcos@informatik.uni-frankfurt.de>
Cc: public-ontolex@w3.org, "A list for those interested in open data in linguistics." <open-linguistics@lists.okfn.org>, "christian.chiarcos@web.de" <christian.chiarcos@web.de>
Message-Id: <B70AF52F-4D86-4836-AC0A-7CD9F235AD9B@w3.org>

> Am 13.03.2016 um 12:09 schrieb Christian Chiarcos <chiarcos@informatik.uni-frankfurt.de>:
> 
> Dear all,
> 
> this is a general technical question, albeit one specific to working with multilinguality issues in multiple lemon/ontolex dictionaries, hence I'm asking here in the first place.
> 
> Imagine the following situation: I use the Russian DBnary (provided in a slightly extended variant of the old lemon) and an ontolex dictionary for Chalkan (with Russian glosses). Both provided by third parties, and I do not want to manipulate the data prior to querying. Now, I want to use DBnary to retrieve an English gloss for the Chalkan words in a single SPARQL query.
> 
> If both dictionaries use the same xml:lang representation, this works rather well (I skip the query for reasons of brevity): I bind the Russian gloss from the Chalkan dictionary to variable ?ru and start searching DBnary for a data property that assigns ?ru as literal.
> 
> It is more complicated, though, if both files use different language codes, e.g., ISO-639-3 (rus) and ISO-639-2 (ru) for Russian, or if a language code with region sub-tag is used (e.g., ru-RU). Is there any way to use, say, BIND to bind the string value of ?ru to a new variable which uses ISO-639-2 codes instead of the original ISO-639-3 (resp. ISO-639-2+ISO-3166) code?


xml:lang allows only for BCP 47 language tags, and here the options you describe (e.g. ISO-639-3 vs. IS0-639-2) are not available. So if you use a language tag validator you can at least detect that an xml:lang value is not valid.
E.g. validate
<!DOCTYPE html>
<html lang="ru">
  <head>
    <meta http-equiv="Content-Type" content="text/html;charset=utf-8" >
     <title>Test</title>
  </head>
  <body>
</body>
</html>
Via
https://validator.w3.org/#validate_by_input
now validate the same with 
<html lang="rus">
and you get an error. 
Of course in your workflow you don’t want to integrate the HTML validator as your language tag validator. But the underlying library
https://about.validator.nu/ <https://about.validator.nu/>
has a class to validate language tags on its own.

Best,

Felix

> 
> At the moment, I see only one way to solve this problem, i.e., using FILTER, str() and a string comparison of both variables. This should be fairly inefficient, though, as I presume the FILTER is applied only after all potential bindings for both variables for Russian terms have been determined.
> 
> Am I overlooking anything?
> 
> Best,
> Christian
> -- 
> Prof. Dr. Christian Chiarcos
> Applied Computational Linguistics
> Johann Wolfgang Goethe Universität Frankfurt a. M.
> 60054 Frankfurt am Main, Germany
> 
> office: Robert-Mayer-Str. 10, #401b
> mail: chiarcos@informatik.uni-frankfurt.de
> web: http://acoli.cs.uni-frankfurt.de
> tel: +49-(0)69-798-22463
> fax: +49-(0)69-798-28931
> 
>

Received on Monday, 14 March 2016 21:38:05 UTC