- From: Felix Sasaki <fsasaki@w3.org>
- Date: Mon, 14 Mar 2016 22:37:50 +0100
- To: Christian Chiarcos <chiarcos@informatik.uni-frankfurt.de>
- Cc: public-ontolex@w3.org, "A list for those interested in open data in linguistics." <open-linguistics@lists.okfn.org>, "christian.chiarcos@web.de" <christian.chiarcos@web.de>
- Message-Id: <B70AF52F-4D86-4836-AC0A-7CD9F235AD9B@w3.org>
> Am 13.03.2016 um 12:09 schrieb Christian Chiarcos <chiarcos@informatik.uni-frankfurt.de>: > > Dear all, > > this is a general technical question, albeit one specific to working with multilinguality issues in multiple lemon/ontolex dictionaries, hence I'm asking here in the first place. > > Imagine the following situation: I use the Russian DBnary (provided in a slightly extended variant of the old lemon) and an ontolex dictionary for Chalkan (with Russian glosses). Both provided by third parties, and I do not want to manipulate the data prior to querying. Now, I want to use DBnary to retrieve an English gloss for the Chalkan words in a single SPARQL query. > > If both dictionaries use the same xml:lang representation, this works rather well (I skip the query for reasons of brevity): I bind the Russian gloss from the Chalkan dictionary to variable ?ru and start searching DBnary for a data property that assigns ?ru as literal. > > It is more complicated, though, if both files use different language codes, e.g., ISO-639-3 (rus) and ISO-639-2 (ru) for Russian, or if a language code with region sub-tag is used (e.g., ru-RU). Is there any way to use, say, BIND to bind the string value of ?ru to a new variable which uses ISO-639-2 codes instead of the original ISO-639-3 (resp. ISO-639-2+ISO-3166) code? xml:lang allows only for BCP 47 language tags, and here the options you describe (e.g. ISO-639-3 vs. IS0-639-2) are not available. So if you use a language tag validator you can at least detect that an xml:lang value is not valid. E.g. validate <!DOCTYPE html> <html lang="ru"> <head> <meta http-equiv="Content-Type" content="text/html;charset=utf-8" > <title>Test</title> </head> <body> </body> </html> Via https://validator.w3.org/#validate_by_input now validate the same with <html lang="rus"> and you get an error. Of course in your workflow you don’t want to integrate the HTML validator as your language tag validator. But the underlying library https://about.validator.nu/ <https://about.validator.nu/> has a class to validate language tags on its own. Best, Felix > > At the moment, I see only one way to solve this problem, i.e., using FILTER, str() and a string comparison of both variables. This should be fairly inefficient, though, as I presume the FILTER is applied only after all potential bindings for both variables for Russian terms have been determined. > > Am I overlooking anything? > > Best, > Christian > -- > Prof. Dr. Christian Chiarcos > Applied Computational Linguistics > Johann Wolfgang Goethe Universität Frankfurt a. M. > 60054 Frankfurt am Main, Germany > > office: Robert-Mayer-Str. 10, #401b > mail: chiarcos@informatik.uni-frankfurt.de > web: http://acoli.cs.uni-frankfurt.de > tel: +49-(0)69-798-22463 > fax: +49-(0)69-798-28931 > >
Received on Monday, 14 March 2016 21:38:05 UTC