Re: dwbp-ISSUE-94 (Git for data): Dataset versioning and dataset replication [Use Cases & Requirements Document]

I agree that  the working group should not recommend specific tools for
dataset versioning and replication.

I think that this should be a more general recommendation, i.e., "the working
group should not recommend best practices based on specific tools".

In my opinion, this use case motivates the creation of  a new challenge and
a new set of requirements. The challenge may be called "Data Versioning"
and a new requirement could be "Data versioning information should be
available. "

kind regards,
Bernadette

2014-11-18 16:27 GMT-03:00 Annette Greiner <amgreiner@lbl.gov>:

> +1
> --
> Annette Greiner
> NERSC Data and Analytics Services
> Lawrence Berkeley National Laboratory
> 510-495-2935
>
> On Nov 18, 2014, at 10:48 AM, Yaso <yaso@nic.br> wrote:
>
> > Hi all
> >
> > I think we can address the ISSUE-94.
> >
> > Hosting datasets on a Git repository can be one (good) way to provide
> > provenance to track data but this is also true to wiki pages, for
> > example. (it's not usual, but can be done in specific cases)
> >
> > I propose that we agree that the working group can not recommend any
> > tool, althought this do not excludes using the use cases that were
> > raised at the discussions to raise requirements.
> >
> > Anyone has any comment?
> >
> > yaso
> >
> >
> >
> >
> >
> >
> > On 11/13/14 11:49 AM, Augusto Herrmann wrote:
> >> Hi.
> >>
> >> Another good example of using git for data is the directory of public
> >> bodies of governments all over the world that OKFn has been curating
> [1][2].
> >>
> >> I agree with Annette's argument that tools on this field are rapidly
> >> evolving, and the WG should probably not recommend a particular tool as
> a
> >> BP at this pint.
> >>
> >> Also relevant to this discussion is Max Ogden's `dat` tool, which
> intends
> >> to be a 'git for data' [3][4]. Looks promising.
> >>
> >> [1] http://publicbodies.org/
> >> [2] https://github.com/okfn/publicbodies
> >> [3] http://www.wired.com/2014/08/dat/
> >> [4] https://github.com/maxogden/dat
> >>
> >> Best regards,
> >> Augusto Herrmann
> >>
> >> On Wed, Nov 12, 2014 at 10:38 AM, Yaso <yaso@nic.br> wrote:
> >>
> >>>
> >>> Em 11/11/14, 7:29 PM, Annette Greiner escreveu:
> >>>> we need to draw a line between data mangement practices and data
> >>> publishing practices.
> >>>
> >>> Agree!
> >>>
> >>> But it's a thin line. We can achieve this (possible) best practices
> >>> either with a vocab or with a versioning document system (Git, HG even
> a
> >>> wiki with yaml). I'm wondering if these items are not data management
> >>> practices AND publishing practices...
> >>>
> >>> track changes in data
> >>> provide possibility to review the history of changes
> >>> provide audit trail
> >>> get access to whichever previous version of data, not only to most
> >>> recent version
> >>>
> >>> Agree about the "get dataset updates more efficiently" being a
> >>> management practice only. for now, at least :-)
> >>>
> >>> yaso
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> --
> >>> Brazilian Internet Steering Committee - CGI.br
> >>> W3C Brazil Office
> >>> @yaso - yaso.eu
> >>>
> >>> 55 11 5509-3537 (4025)
> >>> skype: yasocordova
> >>>
> >>>
> >>
> >
> >
> > --
> > Brazilian Internet Steering Committee - CGI.br
> > W3C Brazil Office
> > @yaso - yaso.eu
> >
> > 55 11 5509-3537 (4025)
> > skype: yasocordova
>
>
>


-- 
Bernadette Farias Lóscio
Centro de Informática
Universidade Federal de Pernambuco - UFPE, Brazil
----------------------------------------------------------------------------

Received on Thursday, 20 November 2014 13:06:03 UTC