Re: Exposing datasets with DCAT (partitioning, subsets..) from Jon Blower on 2016-02-02 (public-sdw-comments@w3.org from February 2016)

From: Jon Blower <j.d.blower@reading.ac.uk>
Date: Tue, 2 Feb 2016 21:07:08 +0000
To: Maik Riechert <m.riechert@reading.ac.uk>
CC: "public-sdw-comments@w3.org" <public-sdw-comments@w3.org>
Message-ID: <242C58D6-BD53-47F2-8D36-ED8F3583F2E4@reading.ac.uk>

Hi Maik,

Yes, the overlapping data must be identical in each subset otherwise it would not make sense. (Unless anyone can think of counter-examples?)

Cheers,
Jon

> On 2 Feb 2016, at 20:50, Maik Riechert <m.riechert@reading.ac.uk> wrote:
> 
> Hi Jon,
> 
> You're right, by definition dct:hasPart allows anything. I put that restriction there for myself really to try to figure out a best practice, but as you say, they can actually overlap. I didn't think about that in my scenario 3.
> 
> I guess subdatasets may then logically overlap each other. I wonder if that's a problem in general. I guess the invariant should be that the overlapping data must be identical then, otherwise it would be quite tricky to handle that. I can't think of a case where the overlapping data is not the same (after unpacking/decoding etc). Anyone?
> 
> Cheers
> Maik
> 
> Am 02.02.2016 um 20:39 schrieb Jon Blower:
>> Hi Maik,
>> 
>> I’ve only had chance to skim this but I had a question about the first (well, second) line on the Wiki page! Is dct:hasPart strictly for *non-overlapping* subsets? I can think of use cases where subsets might overlap. For example, let’s say that there is a global surface temperature product that is available as a set of sub-datasets, one for each continent (for user convenience). The continent boundaries would probably be simple lat-lon bounding boxes and would probably overlap at their edges. Would this be allowed?
>> 
>> Cheers,
>> Jon
>> 
>> 
>>> On 2 Feb 2016, at 11:02, Maik Riechert <m.riechert@reading.ac.uk> wrote:
>>> 
>>> Hi all,
>>> 
>>> There has been a lot of discussion about subsetting data. I'd like to give a slightly different perspective which is purely motivated from the point of view of someone who wants to publish data, and in parallel someone who wants to discover and access that data without much hassle.
>>> 
>>> Of course it is hard to think about all scenarios, so I picked what I think are common ones:
>>> - a bunch of static data files without any API
>>> - an API without static data files
>>> - both
>>> 
>>> And then some specific variations on what structure the data has (yearly data files, daily, or another dimension used as splitting point, such as spatial).
>>> 
>>> It is in no way final or complete and may even be wrong, but here is what I came up with:
>>> https://github.com/ec-melodies/wp02-dcat/wiki/DCAT-partitioning-ideas

>>> 
>>> So it always starts by looking at what data exists and how it is exposed, and based on those constraints I tried to model that as DCAT datasets, sometimes with subdatasets. Again, it is purely motivated from a machine-access point of view. There may be other things to consider.
>>> 
>>> The point of this wiki page is to have something concrete to discuss about and not just abstract ideas. It should uncover problems, possibly solutions, perspectives... etc.
>>> 
>>> Happy to hear your thoughts,
>>> Maik
>>> 
>

Received on Tuesday, 2 February 2016 21:07:41 UTC