Re: DCAT comments - dataset dependecy - http://www.w3.org/TR/2013/WD-vocab-dcat-20130801/ from John Erickson on 2013-10-31 (public-gld-comments@w3.org from October 2013)

From: John Erickson <olyerickson@gmail.com>
Date: Thu, 31 Oct 2013 13:23:36 -0400
To: Fadi Maali <fadi.maali@deri.org>
Cc: Johan De Smedt <johan.de-smedt@tenforce.com>, public-gld-comments@w3.org
Message-ID: <CAC1Gg8TEXBX=anw3cdmu-Afkw80cQT_+rtv7rwuKAauXVuT-cg@mail.gmail.com>
I support Fadi's solution to this. I think that DCAT properly
represents the common case (languages specified at the Dataset level)
but accommodates the special (and rare) case of datasets varying at
the distribution level....Thanks Fadi!

John

On Wed, Oct 30, 2013 at 5:50 AM, Fadi Maali <fadi.maali@deri.org> wrote:
> Hello Johan,
>
> Please see the text I added at: https://dvcs.w3.org/hg/gld/raw-file/default/dcat/index.html#Property:dataset_language
>
> Notice that the text mentions that publishers can add dct:language to instances of dcat:Distribution but I didn't add it to the set of properties of dcat:Distribution for two reasons:
> 1. This is only needed for the case when the dataset has multiple languages and provided through different distributions based on the language. As DCAT has no notion of optional properties, adding dct:language to Distribution to address the particular case of mutli-language dataset might, arguably, be confusing as the language property will be defined on the three levels of Catalog, Dataset and Distribution.
> 2. Doing so now goes beyond editorial changes and requires going through another call for comments for DCAT.
>
> Does that properly address you comment?
>
> Best regards,
> Fadi Maali
> --------------------------------------------------
> Fadi Maali
> PhD student @ Insight Galway (formerly DERI)
> Irish Research Council Embark Scholarship holder
> http://www.deri.ie/users/fadi-maali
>
> On 30 Oct 2013, at 18:32, Johan De Smedt <johan.de-smedt@tenforce.com> wrote:
>
>> Hi Fadi,
>>
>> It makes a lot of sense to me to have language as an optional parameter on the distribution class - as per your example below.
>> However, I did not see this possibility in the model as http://www.w3.org/TR/vocab-dcat/
>>
>> I would support
>> - adding the clarification you make
>> - adding dct:language for that purpose as an optional property on dcat:Distribution.
>>
>> This would cover my main concerns.
>>
>> Kind Regards,
>>
>> Johan De Smedt
>>> -----Original Message-----
>>> From: Fadi Maali [mailto:fadi.maali@deri.org]
>>> Sent: Wednesday, 30 October, 2013 08:19
>>> To: Johan De Smedt
>>> Cc: public-gld-comments@w3.org
>>> Subject: Re: DCAT comments - dataset dependecy - http://www.w3.org/TR/2013/WD-vocab-dcat-
>>> 20130801/
>>>
>>> Further comments inline….
>>>
>>>
>>> On 30 Oct 2013, at 17:46, Johan De Smedt <johan.de-smedt@tenforce.com> wrote:
>>>
>>>> Hi Fadi,
>>>>
>>>> In-line I deleted what is ok for me and answerer on some of your questions
>>>>
>>>> Kind Regards,
>>>>
>>>> Johan De Smedt
>>>>
>>>>> -----Original Message-----
>>>>> From: Fadi Maali [mailto:fadi.maali@deri.org]
>>>>> Sent: Wednesday, 30 October, 2013 06:43
>>>>> To: Johan De Smedt
>>>>> Cc: public-gld-comments@w3.org
>>>>> Subject: Re: DCAT comments - dataset dependecy - http://www.w3.org/TR/2013/WD-vocab-dcat-
>>>>> 20130801/
>>>>>
>>>>> Hello Johan,
>>>>> Thanks for the following up.
>>>>>
>>>>> Some comments inline...
>>>>>
>>>>> On 29 Oct 2013, at 16:58, Johan De Smedt <johan.de-smedt@tenforce.com> wrote:
>>>>>
>>>>>> Hi Sandro, Fadi,
>>>>>>
>>>>>> 1) [JDS:>] [...cut...]
>>>>>>
>>>>>> 2) In case there is still room for amending some text, I would suggest:
>>>>>> a) [JDS:>] [...cut...].
>>>>>
>>>>>> b) To make the usage note on dcat:mediaType more explicit.
>>>>>>    Add to usage note: “Best practice for retrieving a data using dcat:downloadURL is to set the HTTP
>>>>> header ‘Accept’ to a value of dcat:mediaType.”
>>>>>
>>>>> While this sounds right to be recommended, my personal opinion is that the vocabulary
>>> specification
>>>>> should not include this recommendation as it relates to the deployment… thoughts on this?
>>>>>
>>>>>> c) [JDS:>] [...cut...]
>>>>
>>>>>> d) It is not clear how a multilingual dataset can be registered that has different distributions per
>>>>> language
>>>>>>    either -d.1- using a different dcat:downloadURL
>>>>>>         With the current model, this situation can be handled unambiguously by having multiple
>>>>> (further unrelated) data sets.
>>>>>>         If this is considered best practice, this could be clarified in a usage note on dataset
>>>>> dcat:language
>>>>>>    or -d.2- using the same downloadURL but with different values for the HTTP header Accept-
>>>>> Language
>>>>>>         With the current model this could be handled by adding a usage note on the dataset
>>>>> dct:language and on the distribution dcat:downloadURL
>>>>>
>>>>> What about different distributions (each with its own downloadURL) for the same dataset?
>>>> [JDS:>] That is the case as detailed in -d.1- above - right?
>>>
>>> I was referring to different "distributions" while you mentioned "multiple further unrelated data sets".
>>> based on the example you provided below, I gather you meant multiple distributions.
>>>
>>>
>>>> Lets' take EU CELLAR which it actually provides examples for as well d.1 as d.2
>>>> The -d.1- case (multiple download URL)
>>>> - There is only 1 dataset with multiple format and language combinations, each distribution may
>>> have a different URL per language.
>>>> GET http://publications.europa.eu/resource/oj/JOC_2006_331_R_0026_06.DEU
>>>> - with: Accept=application/xml; notice=branch
>>>> GET http://publications.europa.eu/resource/oj/JOC_2006_331_R_0026_06.ENG
>>>> - with: Accept=application/xml; notice=branch
>>>> For DCAT, different dataset are required as the distribution in DCAT does not provide for detailing
>>> the language covered by that distribution.
>>>> Alternatively in DCAT,
>>>> - either 1  dataset is registered with 1 distribution, no downloadURL, an accessURL
>>>> requiring EU CELLAR to make additional landing pages to solve this ambiguity in DCAT.
>>>> - either 2 datasets are registered (one per language) - this would bring it to 20+ datasets as there are
>>> over 20 languages supported
>>>> The -d.2- case (1 download URL)
>>>> GET http://publications.europa.eu/resource/oj/JOC_2006_331_R_0026_06
>>>> - with: Accept=application/xml; notice=branch
>>>> gives a different result with either of the following:
>>>> - Accept-Language=en
>>>> - Accept-Language=de
>>>> The suggested usage note would cover this case without any change to DCAT or the dataset
>>> publisher.
>>>>
>>>> On usage of content negotiation with HTTP header, see also:
>>>> - http://www.w3.org/Protocols/rfc2616/rfc2616-sec12.html
>>>> - http://www.ietf.org/rfc/rfc2295.txt
>>>> Would DCAT be more clear if these are added as a reference - complying with the usage note I
>>> suggest to add?
>>>
>>> IMHO, the right way to model this is by separate distributions of a single dataset. As you mentioned,
>>> language can be described only
>>> on the dataset level of DCAT. I suggest having multiple values on the dataset level for the language and
>>> specifying the specific language of each distribution.
>>>
>>> Example:
>>> :ds1 dct:language lng:en,  lng:de;
>>>         dcat:distribution :dist1, :dist2.
>>> :dist1 a dcat:Distribution;
>>>           dct:language lng:en;
>>>          dcat:accessURL <url-en> .
>>> :dist2 a dcat:Distribution;
>>>           dct:language lng:de;
>>>          dcat:accessURL <url-de> .
>>>
>>> This modelling is equivalent  to the text "this dataset is available in English and Deutsch. It can be
>>> accessed via dist1 which is in English or via dist2 which is in Deutsch"
>>>
>>> If that makes sense to you and to others, I can add the required clarification text to indicate how to
>>> handle multi-language datasets.
>>>
>>> Many thanks!
>>>
>>> - Fadi
>>>
>>>>>
>>>>>
>>>>> Regards,
>>>>> Fadi Maali
>>>>>
>>>>>
>>>>>> Sorry for these late results on an implementation exercise we made with the EU Publication Office
>>>>> CELLAR platform.
>>>>>>
>>>>>> Kind Regards,
>>>>>>
>>>>>> Johan De Smedt
>>>>>> Chief Technology Officer
>>>>>>
>>>>>> mail: johan.de-smedt@tenforce.com
>>>>>> mobile: +32 477 475934
>>>>>> <image002.jpg>
>>>>
>>
>
>



-- 
John S. Erickson, Ph.D.
Director, Web Science Operations
Tetherless World Constellation (RPI)
<http://tw.rpi.edu> <olyerickson@gmail.com>
Twitter & Skype: olyerickson
Received on Thursday, 31 October 2013 17:24:04 UTC