Re: Versioning

Annette,

Your example raises some interesting opportunities.

The US Census is performed every 10 years.  So there can be no comparison
between 1996 and 2000.  The comparison is between 2000 and 2010.  And a
decade is a long time in which a lot of data actually can change.
Demographic trends in the US are rapidly changing due to immigration, aging
population, changing birthrates.

The point is that "following" can create new versions of data when the
scale of time is long.  I therefore recommend retaining "following" and
adding a time modifier to indicate degrees of potential change.

What do you think about that?


Best Regards,

Steve

Motto: "Do First, Think, Do it Again"


|------------>
| From:      |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |Annette Greiner <amgreiner@lbl.gov>                                                                                                               |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| To:        |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |Bernadette Farias Lóscio <bfl@cin.ufpe.br>                                                                                                        |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Cc:        |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |Makx Dekkers <mail@makxdekkers.com>, Data on the Web Best Practices Working Group <public-dwbp-wg@w3.org>                                         |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Date:      |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |07/29/2015 01:45 AM                                                                                                                               |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Subject:   |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |Re: Versioning                                                                                                                                    |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|





I agree that the case of "following" does not create a new version. The
2000 census is not a new version of the 1996 census. Even if they have the
same structure, I would see them as different datasets, because they don't
contain any of the same actual data.

On the other hand, I think that if a new dataset is made that has mostly
the same data but some corrections, then that is a new version, whether the
original dataset is still available or not. Think of the case where you
published a dataset and it was available for one week. Then you posted a
corrected version of that dataset because you found out that one of the
columns had some wrong values. You posted the new dataset as a new version.
It was available as a new version, along with the old one, for a week, and
then your supervisor said "we shouldn't make that old version of the data
available anymore". So you took it down. Does your taking down of the
original suddenly make the revised version no longer something that should
be versioned? I would argue no, in part because the decision of whether or
not to version needs to be made earlier, but mostly because you may have
had people who downloaded the original one during those two weeks in which
it was available. If you silently update it, those people will have no idea
that a new version is available and will have no way of distinguising the
old one from the new one. They would likely therefore continue to use the
erroneous data.

In the rare case that you publish something and find an error and then are
able to change it before anyone else downloads it, or so few people
download it that you're able to contact them and explain the change, then I
think it's fine to just change it without versioning.

As for adapting, I think that would be an alternative format of the same
version.
I guess my overarching criterion is whether there is a possibility of
confusion on the part of the data consumer. If a dataset is published as
data for a different quarter or year, it should be clear from the title and
other metadata that it's different from the one it follows. If data is
modified silently, then you have the possibility of confusion. If a dataset
is adapted to a different language, it will be obvious to anyone looking at
it that it is different from the original language version.

-Annette

On Jul 28, 2015, at 2:05 PM, Bernadette Farias Lóscio <bfl@cin.ufpe.br>
wrote:

> Hi Annette and Makx,
>
> I think we (Makx and I) agree that "following" is not a case of
versioning. @Annettte, do you also agree? IMO, this case represents a
collection of datasets that share structure but don't share data.
>
> For the other cases, I think we should consider that a dataset is a
version of another dataset when a new dataset is created based on an
existing dataset, i.e, the two datasets will have some data and/or
structure in common. I think this applies both to superseding and adapting
cases mentioned by Makx. On the other hand, If a dataset is modified
without the creation of a new dataset, then there is no versioning. Does it
make sense for you?
>
> Thanks!
> Bernadette
>
>
>
> 2015-07-27 17:30 GMT-03:00 Annette Greiner <amgreiner@lbl.gov>:
> I think you and Bernadette are defining superseding and modifying
conversely, but I think both cases call for versioning. I would consider
the case where a dataset is modified and wholly replaced with the corrected
one as a case where versioning is needed. I also consider the case where a
dataset is modified and the older version is still available as a case
where versioning is needed as well. If you have stored an older version and
it presents itself as the exact same thing, it should be the exact same
thing. Otherwise, you could reuse a deprecated version without knowing it.
> -Annette
>
> --
> Annette Greiner
> NERSC Data and Analytics Services
> Lawrence Berkeley National Laboratory
> 510-495-2935
>
> On Jul 27, 2015, at 9:56 AM, Makx Dekkers <mail@makxdekkers.com> wrote:
>
> > Annette,
> >
> > Good point.
> >
> > I was not implying that if data is modified, the old version should
*never*
> > remain available. Maybe a matter of definition: according to my
> > categorisation, if a publisher modifies data and keeps the old version
> > available (the one that may have errors, partial data, outdated
> > information), it falls in the category of superseding.
> >
> > The definition of modifying is then "updating but not keeping the old
data
> > available". Sometimes you really want to stop people from accessing and
> > using data that you know is wrong.
> >
> > Makx.
> >
> >
> >
> >> -----Original Message-----
> >> From: Annette Greiner [mailto:amgreiner@lbl.gov]
> >> Sent: 27 July 2015 18:20
> >> To: Laufer <laufer@globo.com>
> >> Cc: Makx Dekkers <mail@makxdekkers.com>; Data on the Web Best
> >> Practices Working Group <public-dwbp-wg@w3.org>
> >> Subject: Re: Versioning
> >>
> >> I agree with most of this, but I think that, except for real-time
data,
> >> modifying implies a new version. The question of whether something is
> >> superseded seems to me orthogonal. If we didn't maintain a "latest
> > version"
> >> link for the BP doc, would modifications of it not call for a new
version?
> >> Limiting versioning to things that are wholly replaced suggests that
old
> >> versions should never remain available, which I think is not best
> > practice.
> >> -Annette
> >>
> >> On Jul 27, 2015, at 8:11 AM, Laufer <laufer@globo.com> wrote:
> >>
> >>> Thank you Makx for this text about some relations between datasets.
> >>>
> >>>> Do others agree with limiting versioning to the ‘Superseding’
category?
> >>>
> >>> I agree.
> >>>
> >>> And I think we should have a text in our document telling readers
that
> > this
> >> is our understanding about versioning.
> >>>
> >>> But I have a question: what about the other "meanings"? There will be
> > any
> >> type of BPs for them?
> >>>
> >>> Best Regards,
> >>> Laufer
> >>>
> >>> Em segunda-feira, 27 de julho de 2015, Makx Dekkers
> >> <mail@makxdekkers.com> escreveu:
> >>> Thanks Bernadette,
> >>>
> >>>
> >>>
> >>> Good to know that your perspective is that versioning only refers to
the
> >> ‘Superseding’ case. I fully agree with your perspective.
> >>>
> >>>
> >>>
> >>> However, you make some statements about the other types of changes
> >> that I don’t agree with.
> >>>
> >>>
> >>>
> >>> I do not agree that ‘Following’ creates different ‘states’ of the
same
> >> dataset. To me, this year’s budget is only related to last year’s
budget
> >> because they are both budgets, but they are not versions of the same
> > thing.
> >> They may have the same granularity (e.g. expressed in thousands of
> > dollars)
> >> but the structure could be different (e.g. because of organisational
or
> >> regional changes). For me, time series (and spatial series) have
nothing
> > to do
> >> with versioning.
> >>>
> >>>
> >>>
> >>> I do also not agree that ‘Adapting’ creates a new state (as in data
at a
> >> particular moment). All adaptations are equally valid and exist in
> > parallel. To
> >> me, adaptations are almost in the same category as the different
formats
> >> that DCAT groups as Distributions of a Dataset.
> >>>
> >>>
> >>>
> >>> Finally, I do I agree that ‘Modifying’ creates a different state and
not
> > a new
> >> version. In many cases, a publisher might not even bother to keep the
old
> > file
> >> but would just change the dct:modified date in the metadata.
> >>>
> >>>
> >>>
> >>> Do others agree with limiting versioning to the ‘Superseding’
category?
> >>>
> >>>
> >>>
> >>> Makx.
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> From: Bernadette Farias Lóscio [mailto:bfl@cin.ufpe.br]
> >>> Sent: 27 July 2015 13:51
> >>> To: Makx Dekkers <mail@makxdekkers.com>
> >>> Cc: Data on the Web Best Practices Working Group <public-dwbp-
> >> wg@w3.org>
> >>> Subject: Re: Versioning
> >>>
> >>>
> >>>
> >>> Hi Makx,
> >>>
> >>>
> >>>
> >>> Thanks for bringing this discussion and clarifying those differences.
> > IMO this
> >> kind of distinction is important. However,  I am not sure if we should
> > call
> >> "versioning" all types of "updates" that you presented. I created the
> >> following table to help me to visualize these updates in terms of data
(or
> >> content) changes and structure changes.
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> content change
> >>>
> >>> structure change
> >>>
> >>>
> >>>
> >>> Superseding
> >>>
> >>> yes
> >>>
> >>> yes
> >>>
> >>> new version
> >>>
> >>> Following
> >>>
> >>> yes
> >>>
> >>> no
> >>>
> >>> different spatial/temporal granularity
> >>>
> >>> Modifying
> >>>
> >>> yes
> >>>
> >>> no
> >>>
> >>> the data may have been updated or data may have been added
> >>>
> >>> Adapting
> >>>
> >>> yes
> >>>
> >>> no
> >>>
> >>> content is the same, but in different contexts
> >>>
> >>>
> >>>
> >>> I think that just in the first case (superseding) there will be a new
> > version of
> >> the dataset. In the other cases, there will be different states of the
> > same
> >> dataset, where a dataset state means the data in the dataset at a
> > particular
> >> moment.
> >>>
> >>>
> >>>
> >>> Please, let me know if I understood correct and if these ideas make
> > sense
> >> to you.
> >>>
> >>>
> >>>
> >>> Cheers,
> >>>
> >>> Bernadette
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> Superseding:
> >>>
> >>>
> >>>
> >>> Content and structure might be very different but the publisher wants
> > you
> >> to use the current resource rather than a resource that preceded it.
The
> > URL
> >> stays the same while the content changes although the broad intention
of
> >> the content stays the same.
> >>>
> >>>
> >>>
> >>> Examples:
> >>>
> >>> •                    Today’s website (or, more general, web resource)
> > versus last
> >> week’s (Memento);
> >>>
> >>> •                    Latest version link, e.g. latest published draft
of
> > BP
> >> http://www.w3.org/TR/dwbp/.

> >>>
> >>>
> >>>
> >>> Following:
> >>>
> >>>
> >>>
> >>> The type of content is the same but it covers a different time
period.
> > Both
> >> the new and the old data remain valid. (NB: spatial series, e.g. the
same
> > kind
> >> of data for different regions, are similar to temporal series in many
> > respects.)
> >>>
> >>>
> >>>
> >>> Examples:
> >>>
> >>> •                    Sequences of annual budgets;
> >>>
> >>> •                    Daily meteorological observations;
> >>>
> >>> •                    Periodical census data.
> >>>
> >>>
> >>>
> >>> Modifying:
> >>>
> >>>
> >>>
> >>> Content, structure and data points are the same to some extent but
the
> >> data may have been updated or data may have been added.
> >>>
> >>>
> >>>
> >>> Examples:
> >>>
> >>> •             Correcting errors in values of data points, e.g.
resulting
> > from quality
> >> control or user feedback;
> >>>
> >>> •             Adding data points, e.g. if measurements from different
> > measuring
> >> devices come in at different times but belong together;
> >>>
> >>> •             Updating values, e.g. in a Year-to-date file.
> >>>
> >>>
> >>>
> >>> Adapting:
> >>>
> >>>
> >>>
> >>> Content and structure are essentially the same but in different
> > contexts.
> >>>
> >>>
> >>>
> >>> Examples:
> >>>
> >>> •             Translations of text fields or labels;
> >>>
> >>> •             Conversion of co-ordinate system;
> >>>
> >>> •             Conversions of measures, e.g. ºC to ºF, imperial units
to
> > SI;
> >>>
> >>> •             Changes in granularity.
> >>>
> >>>
> >>>
> >>> Should we somehow take such distinctions into account or should we
lump
> >> them all together?
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> --
> >>>
> >>> Bernadette Farias Lóscio
> >>> Centro de Informática
> >>> Universidade Federal de Pernambuco - UFPE, Brazil
> >>>
> >
----------------------------------------------------------------------------

> >>>
> >>>
> >>>
> >>> --
> >>> .  .  .  .. .  .
> >>> .        .   . ..
> >>> .     ..       .
> >
> >
>
>
>
>
>
> --
> Bernadette Farias Lóscio
> Centro de Informática
> Universidade Federal de Pernambuco - UFPE, Brazil
>
----------------------------------------------------------------------------

Received on Monday, 3 August 2015 18:29:17 UTC