W3C home > Mailing lists > Public > public-dwbp-wg@w3.org > November 2014

Re: dwbp-ISSUE-94 (Git for data): Dataset versioning and dataset replication [Use Cases & Requirements Document]

From: Annette Greiner <amgreiner@lbl.gov>
Date: Tue, 11 Nov 2014 13:29:01 -0800
Cc: "public-dwbp-wg@w3.org" <public-dwbp-wg@w3.org>
Message-Id: <88409595-3706-4341-AE63-7EAC948A7284@lbl.gov>
To: Yaso <yaso@nic.br>
I think the field is moving too fast to tie recommendations to any specific product. I like the list of items raised that you show below, and I think most of those could be readily turned into best practices. "get dataset updates more efficiently" is the one exception. We can't raise a comparative without naming what it is compared *to*, and today's most efficient systems will be tomorrow's inefficient ones. Also, at some point, we need to draw a line between data mangement practices and data publishing practices.
-Annette

On Nov 11, 2014, at 11:06 AM, Yaso <yaso@nic.br> wrote:

> About this issue
> 
> I read Rufus's post last year and paid attention to cases that could be
> expressive for the wg. Despite OKFN has implemented many tools for data
> using git concepts, I think that the best experience that we had with
> github and data was a hackathon that we held with the Ministry of
> Justice. We recieved pdf "data" [1], and the participants decided to
> work on these files to serialize to csv, and the work was all made at
> github.
> 
> We opened a list of issues and worked at the dataset untill it gets
> ready to be used at the projects. [2].
> 
> Yes, all items raised by Peter Hanečák at the issue proved to be true
> for git:
> 
> track changes in data
> provide possibility to review the history of changes
> provide audit trail
> get access to whichever previous version of data, not only to most
> recent version
> get datasets updates more efficiently
> 
> However, I keep wondering if it is an ideal situation that the WG give
> opinion on the use of a particular tool for version control of data as
> best practice or if we just should list use cases and try to extract
> procedures that can turn in to bp.
> 
> If the second option is true, maybe we can look at http://data.okfn.org/
> 
> Just to illustrate what I'm talking about tools, this one [3] for
> geodata versioning is also interesting and inspired by git structure.
> 
> 
> [1] https://github.com/W3CBrasil/PerguntasMJ/issues?q=is%3Aissue+is%3Aclosed
> 
> [2]
> http://dados.gov.br/dataset?groups=defesa-seguranca&tags=acidentes+de+tr%C3%A2nsito
> 
> [3] http://geogig.org/
> 
> 
> yaso
> 
> 
> 
> 
> 
> Em 11/10/14, 5:49 PM, Data on the Web Best Practices Working Group Issue
> Tracker escreveu:
>> dwbp-ISSUE-94 (Git for data): Dataset versioning and dataset replication [Use Cases & Requirements Document]
>> 
>> http://www.w3.org/2013/dwbp/track/issues/94
>> 
>> Raised by: Phil Archer
>> On product: Use Cases & Requirements Document
>> 
>> <a href="https://www.w3.org/2013/dwbp/wiki/Second-Round_Use_Cases#Dataset_versioning_and_dataset_replication">Another use case</a> from Peter Hanečák poses the problem of tracking of changes to datasets which, AFAIAC is part of provenance but he goes deeper than that, which might be instructive. He goes on to suggest that the way to provide this info is to host datasets on a Git repository.
>> 
>> How does the WG wish to handle this use case?
>> 
>> 
>> 
> 
> 
> -- 
> Brazilian Internet Steering Committee - CGI.br
> W3C Brazil Office
> @yaso - yaso.eu
> 
> 55 11 5509-3537 (4025)
> skype: yasocordova
> 
Received on Tuesday, 11 November 2014 21:29:36 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:24:18 UTC