Re: ID47 - Update method - Oops! wrong list! from Karen Coyle on 2017-08-16 (public-data-shapes-wg@w3.org from August 2017)

From: Karen Coyle <kcoyle@kcoyle.net>
Date: Wed, 16 Aug 2017 08:01:55 -0700
To: public-data-shapes-wg@w3.org
Message-ID: <f0c31c6e-a55d-2fdf-35b0-c8b8db42fcf0@kcoyle.net>
Ignore! - kc

On 8/15/17 6:53 PM, Karen Coyle wrote:
> This is about ID47[1], which is about a certain relationship between
> between datasets: how they serve as updates one to the other.
> 
> This may be a narrower use case than ID32[2], which is about
> relationships between datasets.
> 
> 1. The simplest case is that successive datasets over time are more
> recent versions of the data. The newer dataset may render older datasets
> obsolete in some cases. In other cases, such as in successive censuses
> earlier datasets may be useful for applications like longitudinal
> studies. The key is that each dataset is complete in itself.
> 
> 2. Another case is that datasets are additive - dataset B adds to
> dataset A. An example would be that dataset A is a CSV file with rows
> 1-99 and dataset B is a CSV file with rows 100-199. This is similar to a
> part/whole relationship, except that there is not necessarily a "whole",
> just parts, which are produced generally at different times. The
> datasets can be combined into a single dataset. The value of using the
> individual datasets on their own can vary.
> 
> 3. A version of #2 (which may not need to be distinguished from it) is
> the publication pattern such as "monthly" where there is a base
> cumulative dataset and then periodic additive files until the next time
> that a cumulative dataset is produced. (This was a vital pattern in
> analog resources, but may be less used for digital ones.) It is probably
> expected that recipients can combine datasets in their applications, or
> at least treat them as a single dataset virtually.
> 
> 4. This extends the concepts in #2 and #3. In this scenario, there is a
> "master" database that is updated in place. Other sites have copies of
> the database, and receive (or request/pull) updates. The update files
> contain "records" that, which processed, will result in a file or
> database that is in the same state as the "master" database. The files
> contain new records, changed records (that must replace the older
> records with the same record ID), and delete records (that must be used
> to delete the older record with the same record ID). These files have
> minimal value on their own except as they can be used to update the
> master dataset. This update method is one that is used heavily in the
> library community. In the US, the Library of Congress holds the master
> database but the records are also stored and used by many dozens of
> institutions across the country (and the world).
> 
> These may not be all of the relevant types of update; suggestions welcome.
> 
> Update patterns can be very complex, so this is another case in which
> DCAT may need to define a small number of very common values, with a
> hand-off to "somewhere else" for the long tail. It may also be useful
> for data consumers to know immediately whether a dataset is "stand
> alone" or requires other datasets to be complete.
> 
> kc
> [1] https://w3c.github.io/dxwg/ucr/#ID47
> [2] https://w3c.github.io/dxwg/ucr/#ID32
> 

-- 
Karen Coyle
kcoyle@kcoyle.net http://kcoyle.net
m: 1-510-435-8234 (Signal)
skype: kcoylenet/+1-510-984-3600
Received on Wednesday, 16 August 2017 15:02:19 UTC