Re: dct:language range WAS: ISSUE-2 (olyerickson): dct:language should be added to DCAT [Best Practices for Publishing Linked Data] from Stasinos Konstantopoulos on 2011-12-09 (public-gld-wg@w3.org from December 2011)

From: Stasinos Konstantopoulos <konstant@iit.demokritos.gr>
Date: Sat, 10 Dec 2011 00:28:30 +0200
To: Richard Cyganiak <richard@cyganiak.de>
Cc: "Maali, Fadi" <fadi.maali@deri.org>, Government Linked Data Working Group WG <public-gld-wg@w3.org>
Message-ID: <CANaM+WF=G4+wNq7vnQiHFNnJcP7FOgOrV1rNUp-9ZpKgmL+nKg@mail.gmail.com>

Hi again,

On 9 December 2011 20:34, Richard Cyganiak <richard@cyganiak.de> wrote:
> Hi Stasinos,
>
> On 9 Dec 2011, at 03:12, Stasinos Konstantopoulos wrote:
>> There are various alternatives that the group might want to consider,
>> depending on how specific we want to be and how badly (if at all) we
>> strive to define dct:language semantics in RDFS or a natural language
>> definition is also acceptable.
>
> I think that a crisp human-readable definition is actually more important than a crisp RDFS definition.
>
> Another, even more important, concern is that the property has to be able to express the data that is actually available in source catalogues. If all catalogues we care about already use ISO language codes to indicate languages, then it's fine to require the use of these codes in the RDF expression. But if a fair share of catalogues use other means of indicating the language (e.g., a free-text field that may not cleanly map to ISO language codes), then our chosen property has to support that too and can't require something based on ISO language codes.
>
> As a general principle, requiring a data representation that is more strict, more formal or more fine-grained than what the data providers have available at the moment would limit re-use.

It's hard to imagine anybody having data that won't fit ISO 639.
Besides listing pretty much every documented language there is
(including extinct and made-up languages like Klingon) it also lists
useful clusters ("macrolanguages"), such as "Arabic" (ara), that allow
one to underspecify when a more detailed description is not available
("ara" subsumes 30 variaties of Arabic, all with their own
three-letter code). It also includes three letter codes for
"undetermined" (und), "multiple and cannot list all" (mul), and "no
linguistic content, not applicable" (zxx).

Stasinos

Received on Friday, 9 December 2011 22:29:04 UTC