Re: [dxwg] DCAT: Proposal for an updated definition for the concept “dataset” (#1195)

Looking to address the original issue by @aidig (and considering the subsequent discussion):
> 
> **dataset**
> collection of data that is regarded as a unit

For DCAT, the unit is the class dcat:Dataset itself. Perhaps the definition could be changed to:

"A collection of data regarded as a unit, published or curated by a single agent, and available for access or download in one or more representations".

> 
>     * Note 1 to entry: Typically, a dataset is collected for a certain purpose.
> 
>     * Note 2 to entry: Typically, a dataset is described using metadata elements including an identifier and a title.

IMO, this is a given by the properties of the class, so probably doesn't need any further clarifications. 

> 
>     * Note 3 to entry: Typically, a dataset is available for use in one or more representations.

This is already mentioned in the first usage note. 
> 
>     * Note 4 to entry: Typically, a dataset is published or curated by a single agent.
> 
This is already in DCAT definition. 

>     * Note 5 to entry: Typically, the data in a dataset are related through a common topic.
> 
>     * Note 6 to entry: Typically, the data in a dataset have the same syntactic structure.
> 
>     * Note 7 to entry: Typically, the data in a dataset are managed using the same governance processes.
> 
>     * Note 8 to entry: Typically, the data in a dataset have a shared data provenance.
> 

@makxdekkers mentioned earlier, and I agree, that these seem too restrictive to include in the definition. Those interpretations are possible with the current DCAT vocabulary (considering datasets and distributions). 

>     * Note 9 to entry: The arrangement of data in one or more datasets is a decision, based on formal requirements or informal considerations.
> 

I suggested adding another usage note highlighting this point. 

> 
> Data catalog specific notes  Geographic information specific notes
> Note 10 to entry: In the context of DCAT, a dataset is published or curated by a single agent.  Note 10 to entry: In the context of geographic information, a dataset can be a smaller grouping of data which, though limited by some constraint such as spatial extent or feature type, is located physically within a larger dataset. Theoretically, a dataset can be as small as a single feature or feature attribute contained within a larger dataset.

I don't think we need to add anything to the definition or usage notes to address this specific case. Examples indeed could be added for this and other points. 

> Note 11 to entry: In the context of DCAT, a dataset is available for access or download in one or more representations.  Note 11 to entry: In the context of geographic information, a hardcopy map or chart can be considered a dataset.

I think that the specific use case can be addressed by accessing the dataset, so I don't think a clarification is needed.

What do people think about this proposal?

-- 
GitHub Notification of comment by agbeltran
Please view or discuss this issue at https://github.com/w3c/dxwg/issues/1195#issuecomment-817981977 using your GitHub account


-- 
Sent via github-notify-ml as configured in https://github.com/w3c/github-notify-ml-config

Received on Monday, 12 April 2021 17:13:39 UTC