Re: dct:language range WAS: ISSUE-2 (olyerickson): dct:language should be added to DCAT [Best Practices for Publishing Linked Data]

+1 to Richard's point.

I am extremely concerned about over-specifying this (or any) DCAT
field. In my view it should be possible for "entry level" adopters to
implement sensible DCAT without a deep understanding of RDF/RDFS/etc.

If a provider wishes to only specify the language(s) of their catalogs
using literals (for example), that should be good enough (and should
be the minimum). If a more sophisticated provider prefers to and is
able to do this less ambiguously using URIs, etc, we should enable
this as well (but not require it).

On Fri, Dec 9, 2011 at 1:34 PM, Richard Cyganiak <richard@cyganiak.de> wrote:
> Hi Stasinos,
>
> On 9 Dec 2011, at 03:12, Stasinos Konstantopoulos wrote:
>> There are various alternatives that the group might want to consider,
>> depending on how specific we want to be and how badly (if at all) we
>> strive to define dct:language semantics in RDFS or a natural language
>> definition is also acceptable.
>
> I think that a crisp human-readable definition is actually more important than a crisp RDFS definition.
>
> Another, even more important, concern is that the property has to be able to express the data that is actually available in source catalogues. If all catalogues we care about already use ISO language codes to indicate languages, then it's fine to require the use of these codes in the RDF expression. But if a fair share of catalogues use other means of indicating the language (e.g., a free-text field that may not cleanly map to ISO language codes), then our chosen property has to support that too and can't require something based on ISO language codes.
>
> As a general principle, requiring a data representation that is more strict, more formal or more fine-grained than what the data providers have available at the moment would limit re-use.
>
> Best,
> Richard
>
>
>
>> Some ideas:
>>
>> 1. Specify explicitly that dct:language rdfs:range
>> http://lexvo.org/ontology#Language . This would be, IMHO, ideal if
>> this vocabulary were maintained by ISO themselves but that is not the
>> case. There are other alternatives, such as ISO 639 RDF [1] and SIL
>> [2], but I think lexvo is most appropriate.
>>
>> 2. Specify only that dct:language rdfs:range
>> http://purl.org/dc/terms/LinguisticSystem , which is less restrictive
>> and would allow one to immediatelly switch to whatever supercedes
>> lexvo but leaves a software agent not much wiser about what to expect
>> as a value of dct:language.
>>
>> 3. Specify, in natural language, that the range can be any vocabulary
>> with a direct and obvious mapping to ISO 639. Very flexible, but again
>> not very informative for software agents; unless there is a way to
>> provide this mapping in a machine-friendly notation like a regular
>> expression.
>>
>> 4. Specify our own vocabulary in W3C space, combining the advantages
>> of (1) and persistent URIs. W3C has started this discussion some years
>> back [3] but there seems to be no concrete outcome; please shout if I
>> have overlooked something.
>>
>> Some further thoughts on combining (3) and (4): ISO 639 is actively
>> maintained, and the RDF vocabulary needs to be updated every time a
>> language is added. Since (again, please shout if I am wrong) ISO do
>> not publish an RDF version of ISO 639, it seems to me that it makes
>> more sense for this group to define a persistent languages namespace
>> within W3C and a regular expression that maps from that namespace to
>> ISO 639 and never need to change anything in order to be up to date
>> with ISO 639.
>>
>> Stasinos
>>
>>
>>
>> [1] http://downlode.org/Code/RDF/ISO-639
>>
>> [2] http://www.ethnologue.com/language_index.asp This is very rich LOD
>> but lacks an RDF formalization, so there is no language class defined
>> or anything, but there is a semi-structured mapping of macrolanguages
>> such as "Arabic" to their constituents
>> (http://www.sil.org/iso639-3/macrolanguages.asp) and there are also
>> cross-links between 639-2 and 639-3 codes (cf, eg,
>> http://www.sil.org/iso639-3/documentation.asp?id=gre and
>> http://www.sil.org/iso639-3/documentation.asp?id=ell) and the
>> Ethonologue database
>> (http://www.ethnologue.com/show_language.asp?code=ell)
>>
>> [3] http://www.w3.org/wiki/Languages_as_RDF_Resources
>>
>
>



-- 
John S. Erickson, Ph.D.
Director, Web Science Operations
Tetherless World Constellation (RPI)
<http://tw.rpi.edu> <olyerickson@gmail.com>
Twitter & Skype: olyerickson

Received on Friday, 9 December 2011 18:43:30 UTC