RE: What is a version?

Hi Phil,

Good points that we could build on for dataset versioning.

One problem that I see, though, is that the first principle is about user expectations. This supposes that there is a clean way to determine those expectations. In the case of datasets, there might be patterns of user expectations as a function of the kind of data that the dataset represents as well as of the kind of change.

We could say that in the case of the correction of an error in a dataset, e.g. when one of the observations was recorded erroneously through transcription or sensor malfunction, it could be considered sensible to just replace the file and update dct:modified. After all, a user would expect to find the correct file rather than the file with the error. 

But a problem might occur if someone had accessed the file with the error, and wrote an article about it, criticising the data creator for sloppy work, and linking to the dataset as it was when it linked to the file with the error. Replacing the file invalidates the article because it refers to the dataset which links to the correct file, so a reader has no idea what the problem is -- in fact the author of the article can now be criticised for distributing fake news.

The second principle could drive the decision whether to keep the old one with a link to the new one, e.g. if the original dataset with the error came with some sort of persistence guarantee, or to add a property to the description of the dataset to make sure people understand what happened, like the way the European DCAT-AP uses owl:versionInfo and adms:versionNotes.


-----Original Message-----
From: Phil Archer [] 
Sent: 17 August 2017 11:34
Subject: What is a version?

Dear all,

One of the perma-discussions around datasets is when one should and should not declare that one is a new version of another. Things like annual sets of figures are easy, but when one figure in a file of millions is corrected, is that a new version or not? It all depends on context.

This is a topic that comes up in my new role at GS1 where the discussion is not about datasets but products and their identifiers. If a product has new packaging but is still the same product, the same quantity etc. 
does it need a new GTIN (barcode)?

To tackle this, GS1 spent a lot of time on the topic and came up with some guidelines that I think might be translatable into useful advice in the context of the DXWG.

The page at sets it out most succinctly.

The first principle is:

Is a consumer or supply chain partner expected to distinguish new or changed products from previous/current products?

That might translate into:

Is an application developer or data consumer expected to distinguish new or changed datasets from previous/current versions of the data?

The second principle is:

Is there a regulatory/liability disclosure requirement to the consumer and/or trading partner?

That might translate into:

Is there a regulatory/liability aspect relevant to the data producer or consumer that is affected by the change in the data?

The third principle is:

Is there a material impact to the supply chain (i.e.: how the product is shipped, stored, received)

This is less readily translated into DXWG and is probably covered by the first one but in DXWG context might apply to APIs rather than datasets.

GS1 adds 10 rules on top of these principles but that's probably going too far into supply chain specifics. What struck me though was that the approach seemed helpful and may be something that DCAT 1.1 might want to include as (non-normative) guidance.



Phil Archer
+44 7887 767755

Received on Thursday, 17 August 2017 12:38:52 UTC