ID47 - Update method from Karen Coyle on 2017-08-16 (public-data-shapes-wg@w3.org from August 2017)

From: Karen Coyle <kcoyle@kcoyle.net>
Date: Tue, 15 Aug 2017 18:53:23 -0700
To: "public-data-shapes-wg@w3.org" <public-data-shapes-wg@w3.org>
Message-ID: <64a44efb-a18b-d60d-8558-b231e07dd4b8@kcoyle.net>

This is about ID47[1], which is about a certain relationship between
between datasets: how they serve as updates one to the other.

This may be a narrower use case than ID32[2], which is about
relationships between datasets.

1. The simplest case is that successive datasets over time are more
recent versions of the data. The newer dataset may render older datasets
obsolete in some cases. In other cases, such as in successive censuses
earlier datasets may be useful for applications like longitudinal
studies. The key is that each dataset is complete in itself.

2. Another case is that datasets are additive - dataset B adds to
dataset A. An example would be that dataset A is a CSV file with rows
1-99 and dataset B is a CSV file with rows 100-199. This is similar to a
part/whole relationship, except that there is not necessarily a "whole",
just parts, which are produced generally at different times. The
datasets can be combined into a single dataset. The value of using the
individual datasets on their own can vary.

3. A version of #2 (which may not need to be distinguished from it) is
the publication pattern such as "monthly" where there is a base
cumulative dataset and then periodic additive files until the next time
that a cumulative dataset is produced. (This was a vital pattern in
analog resources, but may be less used for digital ones.) It is probably
expected that recipients can combine datasets in their applications, or
at least treat them as a single dataset virtually.

4. This extends the concepts in #2 and #3. In this scenario, there is a
"master" database that is updated in place. Other sites have copies of
the database, and receive (or request/pull) updates. The update files
contain "records" that, which processed, will result in a file or
database that is in the same state as the "master" database. The files
contain new records, changed records (that must replace the older
records with the same record ID), and delete records (that must be used
to delete the older record with the same record ID). These files have
minimal value on their own except as they can be used to update the
master dataset. This update method is one that is used heavily in the
library community. In the US, the Library of Congress holds the master
database but the records are also stored and used by many dozens of
institutions across the country (and the world).

These may not be all of the relevant types of update; suggestions welcome.

Update patterns can be very complex, so this is another case in which
DCAT may need to define a small number of very common values, with a
hand-off to "somewhere else" for the long tail. It may also be useful
for data consumers to know immediately whether a dataset is "stand
alone" or requires other datasets to be complete.

kc
[1] https://w3c.github.io/dxwg/ucr/#ID47
[2] https://w3c.github.io/dxwg/ucr/#ID32
-- 
Karen Coyle
kcoyle@kcoyle.net http://kcoyle.net
m: 1-510-435-8234 (Signal)
skype: kcoylenet/+1-510-984-3600

Received on Wednesday, 16 August 2017 01:53:50 UTC