- From: Karen Coyle <kcoyle@kcoyle.net>
- Date: Tue, 15 Aug 2017 18:53:23 -0700
- To: "public-data-shapes-wg@w3.org" <public-data-shapes-wg@w3.org>
This is about ID47[1], which is about a certain relationship between between datasets: how they serve as updates one to the other. This may be a narrower use case than ID32[2], which is about relationships between datasets. 1. The simplest case is that successive datasets over time are more recent versions of the data. The newer dataset may render older datasets obsolete in some cases. In other cases, such as in successive censuses earlier datasets may be useful for applications like longitudinal studies. The key is that each dataset is complete in itself. 2. Another case is that datasets are additive - dataset B adds to dataset A. An example would be that dataset A is a CSV file with rows 1-99 and dataset B is a CSV file with rows 100-199. This is similar to a part/whole relationship, except that there is not necessarily a "whole", just parts, which are produced generally at different times. The datasets can be combined into a single dataset. The value of using the individual datasets on their own can vary. 3. A version of #2 (which may not need to be distinguished from it) is the publication pattern such as "monthly" where there is a base cumulative dataset and then periodic additive files until the next time that a cumulative dataset is produced. (This was a vital pattern in analog resources, but may be less used for digital ones.) It is probably expected that recipients can combine datasets in their applications, or at least treat them as a single dataset virtually. 4. This extends the concepts in #2 and #3. In this scenario, there is a "master" database that is updated in place. Other sites have copies of the database, and receive (or request/pull) updates. The update files contain "records" that, which processed, will result in a file or database that is in the same state as the "master" database. The files contain new records, changed records (that must replace the older records with the same record ID), and delete records (that must be used to delete the older record with the same record ID). These files have minimal value on their own except as they can be used to update the master dataset. This update method is one that is used heavily in the library community. In the US, the Library of Congress holds the master database but the records are also stored and used by many dozens of institutions across the country (and the world). These may not be all of the relevant types of update; suggestions welcome. Update patterns can be very complex, so this is another case in which DCAT may need to define a small number of very common values, with a hand-off to "somewhere else" for the long tail. It may also be useful for data consumers to know immediately whether a dataset is "stand alone" or requires other datasets to be complete. kc [1] https://w3c.github.io/dxwg/ucr/#ID47 [2] https://w3c.github.io/dxwg/ucr/#ID32 -- Karen Coyle kcoyle@kcoyle.net http://kcoyle.net m: 1-510-435-8234 (Signal) skype: kcoylenet/+1-510-984-3600
Received on Wednesday, 16 August 2017 01:53:50 UTC