Re: Exposing datasets with DCAT (partitioning, subsets..) from Maik Riechert on 2016-02-02 (public-sdw-comments@w3.org from February 2016)

From: Maik Riechert <m.riechert@reading.ac.uk>
Date: Tue, 2 Feb 2016 20:50:04 +0000
To: Jon Blower <j.d.blower@reading.ac.uk>
Cc: "public-sdw-comments@w3.org" <public-sdw-comments@w3.org>
Message-ID: <56B1167C.30308@reading.ac.uk>

Hi Jon,

You're right, by definition dct:hasPart allows anything. I put that 
restriction there for myself really to try to figure out a best 
practice, but as you say, they can actually overlap. I didn't think 
about that in my scenario 3.

I guess subdatasets may then logically overlap each other. I wonder if 
that's a problem in general. I guess the invariant should be that the 
overlapping data must be identical then, otherwise it would be quite 
tricky to handle that. I can't think of a case where the overlapping 
data is not the same (after unpacking/decoding etc). Anyone?

Cheers
Maik

Am 02.02.2016 um 20:39 schrieb Jon Blower:
> Hi Maik,
>
> I’ve only had chance to skim this but I had a question about the first (well, second) line on the Wiki page! Is dct:hasPart strictly for *non-overlapping* subsets? I can think of use cases where subsets might overlap. For example, let’s say that there is a global surface temperature product that is available as a set of sub-datasets, one for each continent (for user convenience). The continent boundaries would probably be simple lat-lon bounding boxes and would probably overlap at their edges. Would this be allowed?
>
> Cheers,
> Jon
>
>
>> On 2 Feb 2016, at 11:02, Maik Riechert <m.riechert@reading.ac.uk> wrote:
>>
>> Hi all,
>>
>> There has been a lot of discussion about subsetting data. I'd like to give a slightly different perspective which is purely motivated from the point of view of someone who wants to publish data, and in parallel someone who wants to discover and access that data without much hassle.
>>
>> Of course it is hard to think about all scenarios, so I picked what I think are common ones:
>> - a bunch of static data files without any API
>> - an API without static data files
>> - both
>>
>> And then some specific variations on what structure the data has (yearly data files, daily, or another dimension used as splitting point, such as spatial).
>>
>> It is in no way final or complete and may even be wrong, but here is what I came up with:
>> https://github.com/ec-melodies/wp02-dcat/wiki/DCAT-partitioning-ideas
>>
>> So it always starts by looking at what data exists and how it is exposed, and based on those constraints I tried to model that as DCAT datasets, sometimes with subdatasets. Again, it is purely motivated from a machine-access point of view. There may be other things to consider.
>>
>> The point of this wiki page is to have something concrete to discuss about and not just abstract ideas. It should uncover problems, possibly solutions, perspectives... etc.
>>
>> Happy to hear your thoughts,
>> Maik
>>

Received on Tuesday, 2 February 2016 20:50:38 UTC