- From: Phil Archer <phila@w3.org>
- Date: Tue, 6 Jun 2017 08:51:51 +0100
- To: public-dxwg-wg@w3.org
Taking a standards design angle on this, we have a concept, versioning, that is understood in different ways by different people and in different circumstances. To some, even a minor correction to an error in a dataset is enough to call the result a new version. For others, it needs a very much larger scale change. Both positions can readily be justified, neither is right or wrong. Do we think that 10 people will all agree on what is and isn't a version? I suggest not. And so the temptation is then to say, OK, we'll have a means of differentiating between "this is really still version X but we've corrected a few errors", and "this is a whole new version" with maybe several other types in between. OK - but then the problem becomes one of usability. Will users actually bother to choose the right term? Information managers care about metadata far more than anyone else. I would think in terms of: - a very basic model of versioning; - possible qualification/descriptions of how one version differs from another (which may be machine or human-only readable); - replaces/replacedBy relationships; - leaving it to profiles to provide more detailed semantics where necessary; - Following DWBP, especially -- https://www.w3.org/TR/dwbp/#dataVersioning -- https://www.w3.org/TR/dwbp/#VersionIdentifiers My 2 cents Phil On 05/06/2017 23:56, Rob Atkinson wrote: > +1 for articulating version Use Cases in more detail. We will consolidate - > and the current "high level" versioning UC is a placeholder for analysis of > requirements - for which we will need more detail. For now add new Use > Cases with detail and links specific to the specific case. > > Dont assume we need ot handle this by, for example, extending DCAT or > adopting a specific vocabulary > > I see several options: > 1) do nothing > 2) Add something to DCAT and hope it handles the versioning problem in > general > 3) Identify or create one or more versioning conceptual models and > associated vocabularies and recommend adoption > 4) provide guidance around use of microformats in existing or new DCAT > elements (e.g. define semantics of 3.4.0 style version identifiers) > 5) provide guidance on how dataset relationships express version evolution > 6) other > > The only reason to go into this (solution space) in any detail now is to > identify where external constraints should bleed into requirements - for > example a need to be semantically and/or syntactically compatible with > established version negotiation protocols (either at HTTP level or > something like OAI). UCs should express these interactions. > > Could the people who have looked into this deeply please ensure any prior > work that can and should be referenced is called out in related Use Cases. > > Rob Atkinson > > > > On Tue, 6 Jun 2017 at 05:11 Luiz Olavo Bonino <luiz.bonino@dtls.nl> wrote: > >> Dear all, first sorry for not being able to attend again the call due to a >> national holiday here in the Netherlands. This discussion on versioning >> interests me a lot. I have been struggling with DCAT lack of support for >> this and have been considering a conceptual model that could tackle it. For >> me the issue here is reproducibility in the sense that given a dataset >> identifier, we can expect the same set of data items to be retrieved. >> >> In my opinion, conceptually, for the evolution cases Nandana mentioned, >> there is an entity named Dataset and periodically, there are Releases of >> this dataset, i.e., a Release is a concrete instance of a Dataset. This >> Release has some properties, including timestamp of the release and version >> number. >> >> Of course, there are are types of datasets that are not susceptible to be >> released, i.e., retrieving multiple times the same dataset does not >> guarantee that the same set of data items will be retrieved. As Makx >> mentioned the continuously changing datasets like those from information >> systems, e.g., electronic health record data from hospitals. In this latter >> case we could only rely on the create or last modified timestamp of >> individual records. >> >> Does this make sense to you? >> >> >> *Luiz Olavo Bonino* >> CTO FAIR Data >> >> Dutch Techcentre for Life Sciences >> *V*isiting address: Catharijnesingel 54 | 3511 GC Utrecht >> *Postal address: *Postbus 19245 | 3501 DE Utrecht >> >> E-mail: luiz.bonino@dtls.nl >> Mobile: +31 6 24 61 9131 >> Skype: luizolavobonino >> Website: www.dtls.nl >> >> On 5 Jun 2017, at 20:49, Karen Coyle <kcoyle@kcoyle.net> wrote: >> >> Makx, >> >> Thank you. I think that going deeper into the various meanings of >> versioning through additional use cases is a great idea. We can then >> discuss those as a group. (This reminds me of the publication patterns >> for serial publications - and like those it may be hard to cover every >> case.) >> >> One aspect of versioning that may or may not be relevant but that I see >> in my field is "updates in place" - that is, databases or datasets in >> which updated records are included in the dataset, but there is no >> replacement of the entire dataset (although that can usually be >> requested). These require a call for "updates since ...", and there may >> not be any regularity to the update schedule. These types of datasets >> also require three types of updates: new, replace, delete. >> >> Does anyone else have this case, and if so, are you able to create a use >> case for it? >> >> Thanks, >> kc >> >> On 6/5/17 9:44 AM, Makx Dekkers wrote: >> >> Apologies for my slow reaction in the discussion today in the call on >> the versioning use case, >> >> https://www.w3.org/2017/dxwg/wiki/Use_Case_Working_Space#Dataset_Versioning_Information >> . >> I was struggling with my connection and just managed to note in IRC that >> I didn’t agree with the use case. Disagreeing is not the right word but >> I felt that we maybe need to discuss first what we mean by ‘version’, >> because in my work over the years I have engaged in discussions where >> people didn’t have the same opinion on what we were talking about. >> >> >> >> As I see it, there may be various types of ‘versioning’ relationships >> between datasets. For example: >> >> >> >> * Evolution: for example, a dataset that is published with >> year-to-date information; every week or month, new, recent data is >> appended to the existing data. >> * Replacement: for example, existing data was wrong in some way, and a >> new dataset is published that replaces the old data. >> * Snapshots: for example, continuously changing data like the state of >> traffic or weather maps with hourly snapshots. >> * Time series: for example, annual budget data. >> * Conversion: for example, data that is transformed from one >> coordinate system to another, or from one set of units to another; >> similar to translation of textual resources. >> * Lower/higher granularity: for example, maps in different scales, >> images in different resolutions, compression like MP3 versus CD >> sound, and summaries of large amounts of data. >> >> >> >> In my mind, the use case >> >> https://www.w3.org/2017/dxwg/wiki/Use_Case_Working_Space#Dataset_Versioning_Information >> is a useful placeholder for a number of more specific cases that might >> have different requirements. I am pretty sure that some of those >> requirements could be satisfied by some explanatory text in the DCAT >> specification; some others might need addition of other properties (or >> even classes?) to DCAT. >> >> >> >> I am planning to write some of this up in separate use cases over the >> next few weeks. >> >> >> >> Makx. >> >> >> >> >> >> >> -- >> Karen Coyle >> kcoyle@kcoyle.net http://kcoyle.net >> m: 1-510-435-8234 (Signal) >> skype: kcoylenet/+1-510-984-3600 <+1%20510-984-3600> >> >> >> > -- Phil Archer Data Strategist, W3C http://www.w3.org/ http://philarcher.org +44 (0)7887 767755 @philarcher1
Received on Tuesday, 6 June 2017 07:51:41 UTC