W3C home > Mailing lists > Public > public-dxwg-wg@w3.org > October 2019

Re: [dxwg] [I18N] References to ISO-639 vs. BCP47 (#959)

From: Andrea Perego via GitHub <sysbot+gh@w3.org>
Date: Thu, 03 Oct 2019 20:15:44 +0000
To: public-dxwg-wg@w3.org
Message-ID: <issue_comment.created-538110226-1570133742-sysbot+gh@w3.org>
Thanks for suggesting the inclusion of a "health warning", @aphillips . This would indeed be important to address the possible confusion caused by the pointer to the inconsistent definition in DCMI you pointed out - where the textual definition says `dct:language` should be used with ISO language codes, whereas the defined range is not a literal, but class `dct:LinguisticSystem`. 

So, we can deal with this by adding a note, clarifying this point.

@aphillips , if you think this can solve the issue, we'll create a draft PR for you to review.

Just for our records I dug a bit into this. 

The inconsistent DCMI defintion was actually discussed by the GLD WG while working on the first release of DCAT - see https://www.w3.org/2011/gld/track/issues/26 - and ended up in deciding to recommend the use of URIs.

Checking the different DCTERMS guidelines, the confusion is not solved. E.g., in [the chapter about creating metadata](https://www.dublincore.org/resources/userguide/creating_metadata/#Language) of the Dublin Core User Guide, they keep on saying:

> For the identification of languages please follow RFC 4646. Best practice would be to select a value from the three letter language tags of ISO 639 (e.g. http://www.sil.org/iso639-3/codes.asp).

However, the associated links to examples point to [the relevant section of the Publishing Metadata chapter](https://www.dublincore.org/resources/userguide/publishing_metadata/#dcterms:language), which instead states:

> The range of dcterms:language it [sic!] the class dcterms:LinguisticSystem. All values used with dcterms:language have to be instances of this class. Therefore the property may only be used with non-literal values.
> ````turtle
> ex:myBook dcterms:title "A great deliverance" ;
>   dcterms:language [ rdf:value "eng"^^dcterms:RFC4646 ] .
> ````
> or
> ````turtle
> ex:myBook dcterms:title "A great deliverance" ;
>   dcterms:language <http://lexvo.org/id/iso639-3/eng>
> ...
> ex:mySong dcterms:title "The Power of Orange Knickers"
>   dcterms:language _:eng
> _:eng rdfs:Label "English"
>   ex:639-1 "en"  
>   ex:639-2 "eng"
> ````

So, irrespective of the inconsistent free-text statements, `dct:language` seems to be, in the intention of the DCMI editors, an object property (because of its range), and the relevant examples confirm that is not meant to be used with literals.

If this is the case, language codes are meant to be used only with class `dct:LinguisticSystem`, for describing a language, and this (i.e., describing a language) is not in scope of DCAT, but rather of reference registers / controlled vocabularies. 

Of course, this may change in the future, if DCMI is going to relax its axioms, going maybe so far as to make an object property also a datatype property (as @aisaac noted).

However, as @makxdekkers was arguing, this will lead to a backward compatibility issue (at least when `dct:language` is used in the scope of DCAT), and it won't help interoperability. 

Actually, there may be the option of using the corresponding property from DCMI Elements with literals, namely, [`dc:language`](https://www.dublincore.org/specifications/dublin-core/dcmi-terms/#elements-language), which is indeed a datatype property. However, considering the current status of the DCAT2 specification, and that no use case was submitted to motivate the support for language tags in DCAT, I don't think that this alternative can be included in DCAT2.

GitHub Notification of comment by andrea-perego
Please view or discuss this issue at https://github.com/w3c/dxwg/issues/959#issuecomment-538110226 using your GitHub account
Received on Thursday, 3 October 2019 20:15:46 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 19:42:21 UTC