Re: DCAT comments - dataset dependecy - http://www.w3.org/TR/2013/WD-vocab-dcat-20130801/ from Fadi Maali on 2013-10-30 (public-gld-comments@w3.org from October 2013)

From: Fadi Maali <fadi.maali@deri.org>
Date: Wed, 30 Oct 2013 18:18:47 +1100
To: Johan De Smedt <johan.de-smedt@tenforce.com>
Cc: <public-gld-comments@w3.org>
Message-Id: <BA2B1CDD-0479-4A1B-9564-D4AB7BD5C2BB@deri.org>
Further comments inline….


On 30 Oct 2013, at 17:46, Johan De Smedt <johan.de-smedt@tenforce.com> wrote:

> Hi Fadi,
> 
> In-line I deleted what is ok for me and answerer on some of your questions
> 
> Kind Regards,
> 
> Johan De Smedt 
> 
>> -----Original Message-----
>> From: Fadi Maali [mailto:fadi.maali@deri.org]
>> Sent: Wednesday, 30 October, 2013 06:43
>> To: Johan De Smedt
>> Cc: public-gld-comments@w3.org
>> Subject: Re: DCAT comments - dataset dependecy - http://www.w3.org/TR/2013/WD-vocab-dcat-
>> 20130801/
>> 
>> Hello Johan,
>> Thanks for the following up.
>> 
>> Some comments inline...
>> 
>> On 29 Oct 2013, at 16:58, Johan De Smedt <johan.de-smedt@tenforce.com> wrote:
>> 
>>> Hi Sandro, Fadi,
>>> 
>>> 1) [JDS:>] [...cut...]
>>> 
>>> 2) In case there is still room for amending some text, I would suggest:
>>> a) [JDS:>] [...cut...].
>> 
>>> b) To make the usage note on dcat:mediaType more explicit.
>>>     Add to usage note: “Best practice for retrieving a data using dcat:downloadURL is to set the HTTP
>> header ‘Accept’ to a value of dcat:mediaType.”
>> 
>> While this sounds right to be recommended, my personal opinion is that the vocabulary specification
>> should not include this recommendation as it relates to the deployment… thoughts on this?
>> 
>>> c) [JDS:>] [...cut...]
> 
>>> d) It is not clear how a multilingual dataset can be registered that has different distributions per
>> language
>>>     either -d.1- using a different dcat:downloadURL
>>>          With the current model, this situation can be handled unambiguously by having multiple
>> (further unrelated) data sets.
>>>          If this is considered best practice, this could be clarified in a usage note on dataset
>> dcat:language
>>>     or -d.2- using the same downloadURL but with different values for the HTTP header Accept-
>> Language
>>>          With the current model this could be handled by adding a usage note on the dataset
>> dct:language and on the distribution dcat:downloadURL
>> 
>> What about different distributions (each with its own downloadURL) for the same dataset?
> [JDS:>] That is the case as detailed in -d.1- above - right?

I was referring to different "distributions" while you mentioned "multiple further unrelated data sets".  based on the example you provided below, I gather you meant multiple distributions.


> Lets' take EU CELLAR which it actually provides examples for as well d.1 as d.2
> The -d.1- case (multiple download URL)
> - There is only 1 dataset with multiple format and language combinations, each distribution may have a different URL per language.
> GET http://publications.europa.eu/resource/oj/JOC_2006_331_R_0026_06.DEU
> - with: Accept=application/xml; notice=branch
> GET http://publications.europa.eu/resource/oj/JOC_2006_331_R_0026_06.ENG
> - with: Accept=application/xml; notice=branch
> For DCAT, different dataset are required as the distribution in DCAT does not provide for detailing the language covered by that distribution.
> Alternatively in DCAT,
> - either 1  dataset is registered with 1 distribution, no downloadURL, an accessURL 
>  requiring EU CELLAR to make additional landing pages to solve this ambiguity in DCAT. 
> - either 2 datasets are registered (one per language) - this would bring it to 20+ datasets as there are over 20 languages supported
> The -d.2- case (1 download URL)
> GET http://publications.europa.eu/resource/oj/JOC_2006_331_R_0026_06
> - with: Accept=application/xml; notice=branch
> gives a different result with either of the following:
> - Accept-Language=en
> - Accept-Language=de
> The suggested usage note would cover this case without any change to DCAT or the dataset publisher.
> 
> On usage of content negotiation with HTTP header, see also:
> - http://www.w3.org/Protocols/rfc2616/rfc2616-sec12.html
> - http://www.ietf.org/rfc/rfc2295.txt
> Would DCAT be more clear if these are added as a reference - complying with the usage note I suggest to add?

IMHO, the right way to model this is by separate distributions of a single dataset. As you mentioned, language can be described only
on the dataset level of DCAT. I suggest having multiple values on the dataset level for the language and specifying the specific language of each distribution.

Example:
:ds1 dct:language lng:en,  lng:de;
         dcat:distribution :dist1, :dist2.
:dist1 a dcat:Distribution;
           dct:language lng:en;
          dcat:accessURL <url-en> .
:dist2 a dcat:Distribution;
           dct:language lng:de;
          dcat:accessURL <url-de> .

This modelling is equivalent  to the text "this dataset is available in English and Deutsch. It can be accessed via dist1 which is in English or via dist2 which is in Deutsch"

If that makes sense to you and to others, I can add the required clarification text to indicate how to handle multi-language datasets.

Many thanks!

- Fadi

>> 
>> 
>> Regards,
>> Fadi Maali
>> 
>> 
>>> Sorry for these late results on an implementation exercise we made with the EU Publication Office
>> CELLAR platform.
>>> 
>>> Kind Regards,
>>> 
>>> Johan De Smedt
>>> Chief Technology Officer
>>> 
>>> mail: johan.de-smedt@tenforce.com
>>> mobile: +32 477 475934
>>> <image002.jpg>
>
Received on Wednesday, 30 October 2013 07:19:17 UTC