Re: [DCAT] accessURLs vs downloadURLs from Dave Reynolds on 2013-11-15 (public-gld-comments@w3.org from November 2013)

From: Dave Reynolds <dave.e.reynolds@gmail.com>
Date: Fri, 15 Nov 2013 09:09:12 +0000
To: Chris Beer <chris@codex.net.au>, Fadi Maali <fadi.maali@deri.org>, Richard Cyganiak <richard@cyganiak.de>
CC: Luke Blaney <w3.mailing_lists@lukeblaney.co.uk>, John Erickson <olyerickson@gmail.com>, "public-gld-comments@w3.org Comments" <public-gld-comments@w3.org>, Christopher Gutteridge <cjg@ecs.soton.ac.uk>
Message-ID: <5285E4B8.1060502@gmail.com>
Hi Chris,

[N.B. This is not a formal working group response, simply a personal 
comment on the modelling issue.]

Even if dcat:downloadURL were 1:1 [*] that wouldn't preclude it being a 
subproperty of dcat:accessURL.

A possible paraphrased reading of the spec is that dcat:downloadURL is a 
direct link to a URL from which you can retrieve the dataset, you expect 
a GET on that URL will directly give you back data. Whereas 
dcat:accessURL is some possibly indirect way of accessing the data, it 
*might* give you the data or it might give you a SPARQL endpoint or it 
might give you a landing page or something else.

If that reading is correct then a download URL is always a perfectly 
legitimate way of accessing the data and thus is a legal access URL, but 
the converse is not true. That means that dcat:downloadURL is logically 
a special case of dcat:accessURL, whether it is formally stated as a 
subproperty or not.

My reading of Luke's question was that he was checking if that is a 
correct reading.

Possible responses to this could include:

(1) That's right, a downloadURL is always a legal accessURL, we'll 
formally state that downloadURL is a sub property of access URL which 
will make the spec clearer without changing it.

(2) That's right, a downloadURL is a legal accessURL, but we don't want 
to enforce that as an entailment. Profiles of dcat may, for example, 
wish to impose a stronger separation where they only use accessURL for 
non-download locations, and we don't want to preclude such usage. We'll 
clarify the editorial text.

(3) No accessURL is intended only for use where the URL is an indirect 
access and should not be used for a direct download location. [This 
option for completeness, that's not how I would read the spec.]

Dave

[*] I note that both Fadi and Richard suggest there *can* be cases where 
you have multiple downloadURLs, for example if a dataset dump has to be 
split into multiple download files, even if that is not normal.

On 15/11/13 01:22, Chris Beer wrote:
> Hi all.
>
> Apologies if this rambles - written bits at a time between meetings and work tasks.
>
> I think the key here is that we are talking access and download URL's not URI's.
>
> If dcat:Distribution "Represents a specific available form of a dataset. Each dataset might be available in different forms, these forms might represent different formats of the dataset or different endpoints. Examples of distributions include a downloadable CSV file, an API or an RSS feed", then it would logically stand to reason that an accessURL where you go to obtain access to the distribution and that downloadURL is the "physical" location of the distribution proper. This includes an API or end point - anything which sends data to the client side is a download.
>
> I feel the inferences then are: downloadURL is "1:1" - each unique distribution (or manifestation of the dataset in question specified by mime type etc) can only have 1 location - even 2 identical copies of dataset on the same server will have different URL's by virtue of the simple rule that two things cannot occupy the same space at the same time.
>
> AccessURL is both 1::many and many::1 when put against downloadURL - there may be multiple access locations I can come through in order to query or download a dataset, and multiple datasets may share a single accessURL (such as a members login area) - in short - accessURL becomes the actual linkage between different distributions - "hasaccessURL" so to speak becomes the query showing you what's available from a certain place - "hasdownloadURL" should only return me a single result for any distro.
>
> All in all - I do not see the logic in making download a sub-prop of access - they are very different things and are not necessarily linked.
>
> I believe the descriptions in spec are accurate and unambiguous enough as is and any change to the spec should only clarify further not change how the spec works.
>
> -1 to "Define dcat:downloadURL as sub property of dcat:accessURL".
>
> Cheers
>
> Chris
>
>
> Sent from my Sony Xperia™ Z Ultra
>
> ---- Richard Cyganiak wrote ----
>
>> On 14 Nov 2013, at 13:41, Fadi Maali <fadi.maali@deri.org> wrote:
>>>>>> I’d say no. I read the spec as saying that multiple downloadURLs indicate the same data in different formats.
>>>
>>> The more common case is to have downloadURL pointing to the entire dataset. If this is not the case, then I'd say yes you can use multiple downloadURL.
>>> Different formats go in different instances of dcat:Distribution this instance have its format described using dct:format or dcat:mediaType
>>
>> You’re right. What I said above didn’t make sense.
>>
>>>>>> themeTaxonomy
>>>>>> theme
>>>>>> keyword
>>>>>> contactPoint
>>>>>> accessURL
>>>>>> downloadURL
>>>>>> byteSize
>>>>>> mediaType
>>>>>>
>>>
>>> The domain of these properties were not defined in the ontology because they can be of more use outside the scope of DCAT. e.g. one might want to use dcat:theme and dcat:keyword without imposing that the subject is of type dcat:Dataset.
>>
>> The definition of dcat:theme is: “The main category of the dataset.”
>>
>> Using the property on something that definitely isn’t a dcat:Dataset would be an error.
>>
>> Same for dcat:keyword.
>>
>> For an example where what you’re trying to do was done well, see SKOS:
>> http://www.w3.org/TR/skos-reference/#L1541
>>
>> But I’m not sure that this is a good idea. If something doesn’t fit the DCAT model of catalog-dataset-distribution, then I think one shouldn’t use DCAT. DCAT isn’t a general-purpose vocabulary for tagging resources or for describing byte streams. The catalog-dataset-distribution model is sufficiently flexible to fit many use cases, but why would one want to use a handful of DCAT properties outside of use cases that fit the model?
>>
>> Best,
>> Richard
Received on Friday, 15 November 2013 09:09:46 UTC