- From: Karen Coyle <kcoyle@kcoyle.net>
- Date: Wed, 16 Aug 2017 08:01:55 -0700
- To: public-data-shapes-wg@w3.org
Ignore! - kc On 8/15/17 6:53 PM, Karen Coyle wrote: > This is about ID47[1], which is about a certain relationship between > between datasets: how they serve as updates one to the other. > > This may be a narrower use case than ID32[2], which is about > relationships between datasets. > > 1. The simplest case is that successive datasets over time are more > recent versions of the data. The newer dataset may render older datasets > obsolete in some cases. In other cases, such as in successive censuses > earlier datasets may be useful for applications like longitudinal > studies. The key is that each dataset is complete in itself. > > 2. Another case is that datasets are additive - dataset B adds to > dataset A. An example would be that dataset A is a CSV file with rows > 1-99 and dataset B is a CSV file with rows 100-199. This is similar to a > part/whole relationship, except that there is not necessarily a "whole", > just parts, which are produced generally at different times. The > datasets can be combined into a single dataset. The value of using the > individual datasets on their own can vary. > > 3. A version of #2 (which may not need to be distinguished from it) is > the publication pattern such as "monthly" where there is a base > cumulative dataset and then periodic additive files until the next time > that a cumulative dataset is produced. (This was a vital pattern in > analog resources, but may be less used for digital ones.) It is probably > expected that recipients can combine datasets in their applications, or > at least treat them as a single dataset virtually. > > 4. This extends the concepts in #2 and #3. In this scenario, there is a > "master" database that is updated in place. Other sites have copies of > the database, and receive (or request/pull) updates. The update files > contain "records" that, which processed, will result in a file or > database that is in the same state as the "master" database. The files > contain new records, changed records (that must replace the older > records with the same record ID), and delete records (that must be used > to delete the older record with the same record ID). These files have > minimal value on their own except as they can be used to update the > master dataset. This update method is one that is used heavily in the > library community. In the US, the Library of Congress holds the master > database but the records are also stored and used by many dozens of > institutions across the country (and the world). > > These may not be all of the relevant types of update; suggestions welcome. > > Update patterns can be very complex, so this is another case in which > DCAT may need to define a small number of very common values, with a > hand-off to "somewhere else" for the long tail. It may also be useful > for data consumers to know immediately whether a dataset is "stand > alone" or requires other datasets to be complete. > > kc > [1] https://w3c.github.io/dxwg/ucr/#ID47 > [2] https://w3c.github.io/dxwg/ucr/#ID32 > -- Karen Coyle kcoyle@kcoyle.net http://kcoyle.net m: 1-510-435-8234 (Signal) skype: kcoylenet/+1-510-984-3600
Received on Wednesday, 16 August 2017 15:02:19 UTC