Re: BPMLOD and string metadata

hello Christian,

BCP47 does in fact represent all the ISO 632-3 languages.  What may be 
causing the confusion here is that where a code exists in ISO 632-1  
BCP47 uses that 2-letter code for the initial language subtag (because 
that was and is more widely used). [1]

Of course, one reason BCP47 was created is that the ISO lists (even -3) 
alone cannot express sufficient information to adequately label text 
with language information (often adiitional information is needed such 
as script used, variants, region where used, etc.)  BCP47 allows that. 
For a list of available codes you might want to try
https://r12a.github.io/app-subtags/index.html

For an overview of BCP47, see
https://www.w3.org/International/articles/language-tags/index.en.html

Wrt to the multiple lexicalisations, it's important to separate the 
notions of metadata belonging to a specific string (which is what we 
have been working on) and the localization of strings.  We find those 
two are often conflated in the minds of the developers we speak with.  
For your COVID example, you'd need 3 separate strings, and to 
_internationalise_ the data there needs to be an ability to attach 
language and direction metadata to each.  A _localisation_ mechanism 
works at a higher level to manage alternative strings, and i think your 
COVID example refers to something similar (though monolingual).

I might as well mention here that one thing we have NOT been working on 
is language/direction change _within_ a string.  This can be achieved 
using markup or Unicode formatting characters for direction, but there 
is no good mechanism for identifying string-internal language change 
other than markup.  The string-meta document addresses the problem of 
establishing a base direction and language for a string _as a whole_ 
(noting that the information may be declared as the default for all the 
strings in a resource by a top-level directive).

Hope that helps.

Btw, i'd much prefer us to have conversations like this in GitHub 
issues, rather than by email.  Jorge you have my permission to copy my 
comments to an issue.  (That makes it MUCH easier to find information 
later, and to follow and manage threads, not to mention the automatic 
cross-referencing that you get.)

ri



[1] https://www.w3.org/International/questions/qa-lang-2or3.en.html


Christian Chiarcos wrote on 02/02/2023 09:58:
> Dear Richard, dear all,
>
> just skimming through your documents, I was wondering how the 
> recommended <https://w3c.github.io/string-meta/#language-metadata> 
> metadata approach looks like in practice. Would the general 
> recommendation be to use language indexing 
> <https://w3c.github.io/string-meta/#localization-considerations>, 
> then? I see some issues with that because the same concept can have 
> multiple lexicalizations in the same language (say, "Severe acute 
> respiratory syndrome coronavirus 2"@en alongside "SARS‑CoV‑2"@en, 
> "Wuhan Corona virus"@en, etc.), but the use of a dict here implies you 
> get one string per language max.
>
> Also, are there any constraints or recommendations about the metadata 
> vocabulary (apologies if I overlooked) ? From the linguistic side, 
> BCP47 has been criticized a lot because people would like to add more 
> metadata than ISO 632 or BCP47 support (Gillis-Webber & Tittel 2019, 
> 2020), BCP47 covers ISO 632-1 and ISO 632-2 only, but not ISO 632-3 
> (which is needed for "smaller" languages), ISO 632-3 is insufficient 
> by itself (so that people introduce alternative classifications, e.g., 
> Nordhoff et al. 2011), and most people seem to actually prefer to 
> identify languages by URIs in order to provide explicit metadata (De 
> Melo 2015, Nordhoff et al. 2011).
>
> So far, it seems this discussion in the LLOD community is largely 
> detached from the discussion in the W3C Internationalization Working 
> Group, but these things should definitely be connected to get the 
> perspectives of spec developers, providers and consumers of 
> linguistic/language data covered. Thank you for taking the initiative!
>
> Best,
> Christian
>
> Refs:
>
> Gillis-Webber, F., & Tittel, S. (2019). The shortcomings of language 
> tags for linked data when modeling lesser-known languages. In /2nd 
> Conference on Language, Data and Knowledge (LDK 2019)/. Schloss 
> Dagstuhl-Leibniz-Zentrum fuer Informatik.
>
> Gillis-Webber, F., & Tittel, S. (2020, May). A framework for shared 
> agreement of language tags beyond ISO 639. In /Proceedings of the 
> Twelfth Language Resources and Evaluation Conference/ (pp. 3333-3339).
>
> De Melo, G. (2015). Lexvo. org: Language-related information for the 
> linguistic linked data cloud. /Semantic Web/, /6/(4), 393-400.
>
> Nordhoff, S., & Hammarström, H. (2011). Glottolog/Langdoc: Defining 
> dialects, languages, and language families as collections of 
> resources. In /First International Workshop on Linked Science 2011-In 
> conjunction with the International Semantic Web Conference (ISWC 2011)/.
>
>
> Am Do., 2. Feb. 2023 um 09:57 Uhr schrieb Jorge Gracia del Río 
> <jogracia@unizar.es <mailto:jogracia@unizar.es>>:
>
>     Dear Richard,
>
>     Thanks for this update! We will certainly take a closer look at
>     the report
>
>     Best,
>     Jorge
>
>
>     El mié, 1 feb 2023 a las 18:14, r12a (<ishida@w3.org
>     <mailto:ishida@w3.org>>) escribió:
>
>         dear BPMLOD folks,
>
>         Best wishes for your relaunch!
>
>         Since the last round of work on BPMLOD the W3C
>         Internationalization Working Group has spent a lot of time
>         talking with spec developers about how to attach metadata to
>         strings to indicate the language and the directionality of the
>         string.  For example, JSON LD adopted some new approaches to
>         allow the management of this information.[1]  I wonder whether
>         this is something that would be of interest to the BPMLOD group.
>
>         We produced a document called Strings on the Web: Language and
>         Direction Metadata (https://w3c.github.io/string-meta/
>         <https://urldefense.com/v3/__https://w3c.github.io/string-meta/__;%21%21D9dNQwwGXtA%21Rgepxj7QNGkaui_sSstuffPD7xC42Z6-Te9byilqDIDG0ByuYwhfbhg8QcGhfw2zkKknCuRt4oXLKQ$>)
>         which gives an overview of our current thinking.
>
>         best regards,
>         Richard
>
>
>         [1]
>         https://www.w3.org/TR/json-ld11/#string-internationalization
>         <https://urldefense.com/v3/__https://www.w3.org/TR/json-ld11/*string-internationalization__;Iw%21%21D9dNQwwGXtA%21Rgepxj7QNGkaui_sSstuffPD7xC42Z6-Te9byilqDIDG0ByuYwhfbhg8QcGhfw2zkKknCuSeM8ekBQ$>

Received on Thursday, 2 February 2023 10:37:08 UTC