[dxwg] DCAT: Proposal for an updated definition for the concept “dataset” (#1195)

aidig has just created a new issue for https://github.com/w3c/dxwg:

== DCAT: Proposal for an updated definition for the concept “dataset” ==
_DCAT: Proposal for an updated definition for the concept “dataset”_

# Background and problem statement: 

This is a joint proposal for an updated definition for the concept “dataset”, made by the Danish Agency for Digitisation [1] (member of W3C) and the Danish Agency for Data Supply and Efficiency (member of OGC and, through Danish Standards, member of ISO/TC 211).

The Danish government wants to describe data consistently across authorities, processes and IT systems [3]. In order to achieve this, definitions of the same concepts must be aligned where possible, not only within domains but also across domains.

“Dataset” is used in many different domains and is a highly relevant concept these days, e.g. in the context of the European INSPIRE Directive [4] and the PSI Directive [5]. The Agency for Digitisation and the Agency for Data Supply and Efficiency therefore request W3C, OGC and ISO/TC 211 to come to an agreement regarding using the same definition and notes for “dataset” and submit a first draft for discussion.

# Proposal for updated definition and related notes:

> **dataset** 
> collection of data that is regarded as a unit

> - Note 1 to entry: Typically, a dataset is collected for a certain purpose.
> - Note 2 to entry: Typically, a dataset is described using metadata elements including an identifier and a title.
> - Note 3 to entry: Typically, a dataset is available for use in one or more representations.
> - Note 4 to entry: Typically, a dataset is published or curated by a single agent.
> - Note 5 to entry: Typically, the data in a dataset are related through a common topic.
> - Note 6 to entry: Typically, the data in a dataset have the same syntactic structure.
> - Note 7 to entry: Typically, the data in a dataset are managed using the same governance processes.
> - Note 8 to entry: Typically, the data in a dataset have a shared data provenance.
> - Note 9 to entry: The arrangement of data in one or more datasets is a decision, based on formal requirements or informal considerations.

> Data catalog specific notes | Geographic information specific notes 
> -- | --
> Note 10 to entry: In the context of DCAT, a dataset is published or curated by a single agent. Note 11 to entry: In the context of DCAT, a dataset is available for access or download in one or more representations. | Note 10 to entry: In the context of geographic information, a dataset can be a smaller grouping of data which, though limited by some constraint such as spatial extent or feature type, is located physically within a larger dataset. Theoretically, a dataset can be as small as a single feature or feature attribute contained within a larger dataset. Note 11 to entry: In the context of geographic information, a hardcopy map or chart can be considered a dataset.

# Examples
Examples that are not “typical” according to the notes for “dataset”: 
In order to highlight the need for all the notes to the definition of “dataset”, hereby a list of examples that are not “typical” according to those notes. The examples are not intended to be included in any standard, but are meant as a basis for the discussion. The examples follow the same numbering as the notes above.

Example 1: Dataset that is too large or complex to analyze with the current technologies, but that might be useful when technology evolves.
Example 2: A temporary dataset, such as the result of an SQL query.
Example 3: A planned dataset, that is not yet available or collected.
Example 4: OpenStreetMap data; in general: data collected via crowdsourcing.
Example 5: A use case where data from many different domains (different subjects) is combined to solve a particular problem or need (one purpose).
Example 6: GeoPackage containing vector, raster and styling; EU-dataset where some of the data are INSPIRE-harmonized and some are not.
Example 7: Data collected at the local level in a country, and then aggregated to one dataset containing data for the whole country; data sent in to EEA by the EU member states, and which is then aggregated to one dataset.
Example 8: Dataset containing data having different licences; dataset created by taking subsets from different other datasets.

# References
- [1] https://en.digst.dk/
- [2] https://eng.sdfe.dk/
- [3] THE GOVERNMENT / LOCAL GOVERNMENT DENMARK / DANISH REGIONS. The digitally coherent public sector [online]. Version 1.0. Agency for Digitisation, June 2017. Available from: https://arkitektur.digst.dk/sites/default/files/white_paper_on_a_common_public-sector_digital_architecture_pdfa.pdf
- [4] Directive 2007/2/EC of the European Parliament and of the Council of 14 March 2007 establishing an Infrastructure for Spatial Information in the European Community (INSPIRE) [online]. 25 April 2007. 32007L0002. Available from: http://data.europa.eu/eli/dir/2007/2/oj/eng
- [5] Directive (EU) 2019/1024 of the European Parliament and of the Council of 20 June 2019 on open data and the re-use of public sector information [online]. 26 June 2019. 32019L1024. Available from: http://data.europa.eu/eli/dir/2019/1024/oj/eng


Please view or discuss this issue at https://github.com/w3c/dxwg/issues/1195 using your GitHub account

Received on Tuesday, 17 December 2019 12:55:25 UTC