Re: [dxwg] Model Series of Data as Distributions of a single Dataset (#1429)

Here's a couple sketches trying to elucidate some of these relations:
![DatasetSeriesSubset](https://user-images.githubusercontent.com/513380/143928622-165fdbb5-6a4d-4e49-b3f9-f9f7af2613d8.png)


![Packaging](https://user-images.githubusercontent.com/513380/143926570-b885a90d-04b5-4cc0-982f-8e18bb15ef97.png)


DatasetSeriesSubset Diagram:

Dataset: A collection of data, published or curated by a single agent, and available for access or download in one or more representations, **and containing information conforming to some schema.**  NOTE: identity of dataset is based on the underlying schema, and other variable criteria like authorship, coverage extent, update version.

DataSeries: a collection of datasets sharing the same schema, but differentiated based on some extent criteria like temporal or spatial coverage.  The 'member/inSeries' link from a Series to a Dataset is an association class that specifies parameters determining the extent of the series member. 

ContentModel: a schema (conceptual, logical, or physical) that characterizes a dataset; defines entities, properties, domains, ranges, and other constraints for elements in the dataset.

Distribution: A specific representation of a dataset. A distribution has a Serialization based on some electronic format and profile that determines how that format is used. The serialization for a distribution must implement the schema for the dataset that is represented by the distribution.

PackagedSubset: a dataset that is subset from a sourceDataset based on some query and parameter values for that query; its content is fixed, and can be assigned an identifier.

FilteredDistribution: a representation of a subset of a Dataset based on some query and parameter values for the query, determined dynamically by a user requesting the data through some interface. Can be assigned an identifier to duplicate the query, but if the source dataset is updated, the actual content might vary over time.  

Serialization: a scheme for representing information electronically; based on some format (specified by a MIME type), with optional additional constraints on the format for greater specificity in content, e.g. XML schema, RDF vocabulary used, CSV profile.

parameters:  values that specify criteria in query to define a dataset subset, or that define the extent (temporal, spatial, other...) of a particular DataSeries member. The associated downloadURL could be a URITemplate in which the parameters would be substituted.


Packaging Diagram:
Bundle:  a collection of files that are associated with a 'Dataset' search result, e.g. [DataOne](https://search.dataone.org/view/https%3A%2F%2Fpasta.lternet.edu%2Fpackage%2Fmetadata%2Feml%2Fknb-lter-mcr%2F5042%2F10), [CKAN](https://catalog.data.gov/dataset/airborne-geophysical-survey-ajo-arizona), [MGDS](https://www.marine-geo.org/tools/search/Files.php?data_set_uid=29877).  Includes at least one Dataset distribution and one other file that is related, but not a distribution.

Document: a file that is related to some dataset and included in a Bundle.

Package: a file that contains all the items in a bundle, e.g. a BagIT or ORE archive file. 





-- 
GitHub Notification of comment by smrgeoinfo
Please view or discuss this issue at https://github.com/w3c/dxwg/issues/1429#issuecomment-981941766 using your GitHub account


-- 
Sent via github-notify-ml as configured in https://github.com/w3c/github-notify-ml-config

Received on Monday, 29 November 2021 19:22:15 UTC