Re: Versioning

When I say that data doesn’t actually change, I mean that the later census doesn’t give a new value for, e.g., the 2000 population of Podunk, Idaho. It gives instead the 2010 population of Podunk, Idaho. It is a different observation about the world. Suppose the Podunk population were listed as 401 in the originally published 2000 census. If a revision of the 2000 census came out saying that the official 2000 population of Podunk, Idaho, was in fact 502, that would be a new version of the 2000 census dataset. When publishing either dataset, I would certainly recommend that the release date would be provided.
-Annette


--
Annette Greiner
NERSC Data and Analytics Services
Lawrence Berkeley National Laboratory
510-495-2935

On Aug 3, 2015, at 11:27 AM, Steven Adler <adler1@us.ibm.com> wrote:

> Annette,
> 
> Your example raises some interesting opportunities.
> 
> The US Census is performed every 10 years.  So there can be no comparison between 1996 and 2000.  The comparison is between 2000 and 2010.  And a decade is a long time in which a lot of data actually can change.  Demographic trends in the US are rapidly changing due to immigration, aging population, changing birthrates.  
> 
> The point is that "following" can create new versions of data when the scale of time is long.  I therefore recommend retaining "following" and adding a time modifier to indicate degrees of potential change.
> 
> What do you think about that?
> 
> 
> Best Regards,
> 
> Steve
> 
> Motto: "Do First, Think, Do it Again"
> 
> <graycol.gif>Annette Greiner ---07/29/2015 01:45:27 AM---I agree that the case of "following" does not create a new version. The 2000 census is not a new ver
> 
> <ecblank.gif>
> From:
> <ecblank.gif>
> Annette Greiner <amgreiner@lbl.gov>
> <ecblank.gif>
> To:
> <ecblank.gif>
> Bernadette Farias Lóscio <bfl@cin.ufpe.br>
> <ecblank.gif>
> Cc:
> <ecblank.gif>
> Makx Dekkers <mail@makxdekkers.com>, Data on the Web Best Practices Working Group <public-dwbp-wg@w3.org>
> <ecblank.gif>
> Date:
> <ecblank.gif>
> 07/29/2015 01:45 AM
> <ecblank.gif>
> Subject:
> <ecblank.gif>
> Re: Versioning
> 
> 
> 
> I agree that the case of "following" does not create a new version. The 2000 census is not a new version of the 1996 census. Even if they have the same structure, I would see them as different datasets, because they don't contain any of the same actual data. 
> 
> On the other hand, I think that if a new dataset is made that has mostly the same data but some corrections, then that is a new version, whether the original dataset is still available or not. Think of the case where you published a dataset and it was available for one week. Then you posted a corrected version of that dataset because you found out that one of the columns had some wrong values. You posted the new dataset as a new version. It was available as a new version, along with the old one, for a week, and then your supervisor said "we shouldn't make that old version of the data available anymore". So you took it down. Does your taking down of the original suddenly make the revised version no longer something that should be versioned? I would argue no, in part because the decision of whether or not to version needs to be made earlier, but mostly because you may have had people who downloaded the original one during those two weeks in which it was available. If you silently update it, those people will have no idea that a new version is available and will have no way of distinguising the old one from the new one. They would likely therefore continue to use the erroneous data. 
> 
> In the rare case that you publish something and find an error and then are able to change it before anyone else downloads it, or so few people download it that you're able to contact them and explain the change, then I think it's fine to just change it without versioning.
> 
> As for adapting, I think that would be an alternative format of the same version.
> I guess my overarching criterion is whether there is a possibility of confusion on the part of the data consumer. If a dataset is published as data for a different quarter or year, it should be clear from the title and other metadata that it's different from the one it follows. If data is modified silently, then you have the possibility of confusion. If a dataset is adapted to a different language, it will be obvious to anyone looking at it that it is different from the original language version.
> 
> -Annette
> 
> On Jul 28, 2015, at 2:05 PM, Bernadette Farias Lóscio <bfl@cin.ufpe.br> wrote:
> 
> > Hi Annette and Makx,
> > 
> > I think we (Makx and I) agree that "following" is not a case of versioning. @Annettte, do you also agree? IMO, this case represents a collection of datasets that share structure but don't share data.
> > 
> > For the other cases, I think we should consider that a dataset is a version of another dataset when a new dataset is created based on an existing dataset, i.e, the two datasets will have some data and/or structure in common. I think this applies both to superseding and adapting cases mentioned by Makx. On the other hand, If a dataset is modified without the creation of a new dataset, then there is no versioning. Does it make sense for you?
> > 
> > Thanks!
> > Bernadette
> > 
> > 
> > 
> > 2015-07-27 17:30 GMT-03:00 Annette Greiner <amgreiner@lbl.gov>:
> > I think you and Bernadette are defining superseding and modifying conversely, but I think both cases call for versioning. I would consider the case where a dataset is modified and wholly replaced with the corrected one as a case where versioning is needed. I also consider the case where a dataset is modified and the older version is still available as a case where versioning is needed as well. If you have stored an older version and it presents itself as the exact same thing, it should be the exact same thing. Otherwise, you could reuse a deprecated version without knowing it.
> > -Annette
> > 
> > --
> > Annette Greiner
> > NERSC Data and Analytics Services
> > Lawrence Berkeley National Laboratory
> > 510-495-2935
> > 
> > On Jul 27, 2015, at 9:56 AM, Makx Dekkers <mail@makxdekkers.com> wrote:
> > 
> > > Annette,
> > >
> > > Good point.
> > >
> > > I was not implying that if data is modified, the old version should *never*
> > > remain available. Maybe a matter of definition: according to my
> > > categorisation, if a publisher modifies data and keeps the old version
> > > available (the one that may have errors, partial data, outdated
> > > information), it falls in the category of superseding.
> > >
> > > The definition of modifying is then "updating but not keeping the old data
> > > available". Sometimes you really want to stop people from accessing and
> > > using data that you know is wrong.
> > >
> > > Makx.
> > >
> > >
> > >
> > >> -----Original Message-----
> > >> From: Annette Greiner [mailto:amgreiner@lbl.gov]
> > >> Sent: 27 July 2015 18:20
> > >> To: Laufer <laufer@globo.com>
> > >> Cc: Makx Dekkers <mail@makxdekkers.com>; Data on the Web Best
> > >> Practices Working Group <public-dwbp-wg@w3.org>
> > >> Subject: Re: Versioning
> > >>
> > >> I agree with most of this, but I think that, except for real-time data,
> > >> modifying implies a new version. The question of whether something is
> > >> superseded seems to me orthogonal. If we didn't maintain a "latest
> > > version"
> > >> link for the BP doc, would modifications of it not call for a new version?
> > >> Limiting versioning to things that are wholly replaced suggests that old
> > >> versions should never remain available, which I think is not best
> > > practice.
> > >> -Annette
> > >>
> > >> On Jul 27, 2015, at 8:11 AM, Laufer <laufer@globo.com> wrote:
> > >>
> > >>> Thank you Makx for this text about some relations between datasets.
> > >>>
> > >>>> Do others agree with limiting versioning to the ‘Superseding’ category?
> > >>>
> > >>> I agree.
> > >>>
> > >>> And I think we should have a text in our document telling readers that
> > > this
> > >> is our understanding about versioning.
> > >>>
> > >>> But I have a question: what about the other "meanings"? There will be
> > > any
> > >> type of BPs for them?
> > >>>
> > >>> Best Regards,
> > >>> Laufer
> > >>>
> > >>> Em segunda-feira, 27 de julho de 2015, Makx Dekkers
> > >> <mail@makxdekkers.com> escreveu:
> > >>> Thanks Bernadette,
> > >>>
> > >>>
> > >>>
> > >>> Good to know that your perspective is that versioning only refers to the
> > >> ‘Superseding’ case. I fully agree with your perspective.
> > >>>
> > >>>
> > >>>
> > >>> However, you make some statements about the other types of changes
> > >> that I don’t agree with.
> > >>>
> > >>>
> > >>>
> > >>> I do not agree that ‘Following’ creates different ‘states’ of the same
> > >> dataset. To me, this year’s budget is only related to last year’s budget
> > >> because they are both budgets, but they are not versions of the same
> > > thing.
> > >> They may have the same granularity (e.g. expressed in thousands of
> > > dollars)
> > >> but the structure could be different (e.g. because of organisational or
> > >> regional changes). For me, time series (and spatial series) have nothing
> > > to do
> > >> with versioning.
> > >>>
> > >>>
> > >>>
> > >>> I do also not agree that ‘Adapting’ creates a new state (as in data at a
> > >> particular moment). All adaptations are equally valid and exist in
> > > parallel. To
> > >> me, adaptations are almost in the same category as the different formats
> > >> that DCAT groups as Distributions of a Dataset.
> > >>>
> > >>>
> > >>>
> > >>> Finally, I do I agree that ‘Modifying’ creates a different state and not
> > > a new
> > >> version. In many cases, a publisher might not even bother to keep the old
> > > file
> > >> but would just change the dct:modified date in the metadata.
> > >>>
> > >>>
> > >>>
> > >>> Do others agree with limiting versioning to the ‘Superseding’ category?
> > >>>
> > >>>
> > >>>
> > >>> Makx.
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> From: Bernadette Farias Lóscio [mailto:bfl@cin.ufpe.br]
> > >>> Sent: 27 July 2015 13:51
> > >>> To: Makx Dekkers <mail@makxdekkers.com>
> > >>> Cc: Data on the Web Best Practices Working Group <public-dwbp-
> > >> wg@w3.org>
> > >>> Subject: Re: Versioning
> > >>>
> > >>>
> > >>>
> > >>> Hi Makx,
> > >>>
> > >>>
> > >>>
> > >>> Thanks for bringing this discussion and clarifying those differences.
> > > IMO this
> > >> kind of distinction is important. However,  I am not sure if we should
> > > call
> > >> "versioning" all types of "updates" that you presented. I created the
> > >> following table to help me to visualize these updates in terms of data (or
> > >> content) changes and structure changes.
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> content change
> > >>>
> > >>> structure change
> > >>>
> > >>>
> > >>>
> > >>> Superseding
> > >>>
> > >>> yes
> > >>>
> > >>> yes
> > >>>
> > >>> new version
> > >>>
> > >>> Following
> > >>>
> > >>> yes
> > >>>
> > >>> no
> > >>>
> > >>> different spatial/temporal granularity
> > >>>
> > >>> Modifying
> > >>>
> > >>> yes
> > >>>
> > >>> no
> > >>>
> > >>> the data may have been updated or data may have been added
> > >>>
> > >>> Adapting
> > >>>
> > >>> yes
> > >>>
> > >>> no
> > >>>
> > >>> content is the same, but in different contexts
> > >>>
> > >>>
> > >>>
> > >>> I think that just in the first case (superseding) there will be a new
> > > version of
> > >> the dataset. In the other cases, there will be different states of the
> > > same
> > >> dataset, where a dataset state means the data in the dataset at a
> > > particular
> > >> moment.
> > >>>
> > >>>
> > >>>
> > >>> Please, let me know if I understood correct and if these ideas make
> > > sense
> > >> to you.
> > >>>
> > >>>
> > >>>
> > >>> Cheers,
> > >>>
> > >>> Bernadette
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> Superseding:
> > >>>
> > >>>
> > >>>
> > >>> Content and structure might be very different but the publisher wants
> > > you
> > >> to use the current resource rather than a resource that preceded it. The
> > > URL
> > >> stays the same while the content changes although the broad intention of
> > >> the content stays the same.
> > >>>
> > >>>
> > >>>
> > >>> Examples:
> > >>>
> > >>> •                    Today’s website (or, more general, web resource)
> > > versus last
> > >> week’s (Memento);
> > >>>
> > >>> •                    Latest version link, e.g. latest published draft of
> > > BP
> > >> http://www.w3.org/TR/dwbp/.
> > >>>
> > >>>
> > >>>
> > >>> Following:
> > >>>
> > >>>
> > >>>
> > >>> The type of content is the same but it covers a different time period.
> > > Both
> > >> the new and the old data remain valid. (NB: spatial series, e.g. the same
> > > kind
> > >> of data for different regions, are similar to temporal series in many
> > > respects.)
> > >>>
> > >>>
> > >>>
> > >>> Examples:
> > >>>
> > >>> •                    Sequences of annual budgets;
> > >>>
> > >>> •                    Daily meteorological observations;
> > >>>
> > >>> •                    Periodical census data.
> > >>>
> > >>>
> > >>>
> > >>> Modifying:
> > >>>
> > >>>
> > >>>
> > >>> Content, structure and data points are the same to some extent but the
> > >> data may have been updated or data may have been added.
> > >>>
> > >>>
> > >>>
> > >>> Examples:
> > >>>
> > >>> •             Correcting errors in values of data points, e.g. resulting
> > > from quality
> > >> control or user feedback;
> > >>>
> > >>> •             Adding data points, e.g. if measurements from different
> > > measuring
> > >> devices come in at different times but belong together;
> > >>>
> > >>> •             Updating values, e.g. in a Year-to-date file.
> > >>>
> > >>>
> > >>>
> > >>> Adapting:
> > >>>
> > >>>
> > >>>
> > >>> Content and structure are essentially the same but in different
> > > contexts.
> > >>>
> > >>>
> > >>>
> > >>> Examples:
> > >>>
> > >>> •             Translations of text fields or labels;
> > >>>
> > >>> •             Conversion of co-ordinate system;
> > >>>
> > >>> •             Conversions of measures, e.g. ºC to ºF, imperial units to
> > > SI;
> > >>>
> > >>> •             Changes in granularity.
> > >>>
> > >>>
> > >>>
> > >>> Should we somehow take such distinctions into account or should we lump
> > >> them all together?
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>>
> > >>> Bernadette Farias Lóscio
> > >>> Centro de Informática
> > >>> Universidade Federal de Pernambuco - UFPE, Brazil
> > >>>
> > > ----------------------------------------------------------------------------
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>> .  .  .  .. .  .
> > >>> .        .   . ..
> > >>> .     ..       .
> > >
> > >
> > 
> > 
> > 
> > 
> > 
> > -- 
> > Bernadette Farias Lóscio
> > Centro de Informática
> > Universidade Federal de Pernambuco - UFPE, Brazil
> > ----------------------------------------------------------------------------
> 
> 
> 
> 

Received on Monday, 3 August 2015 22:30:21 UTC