Re: dataset branching (a la git)

Hi Jürgen,

I have been putting some thought cycles into this myself recently.

The use case I a riffing on is not a million miles away from yours.
Basically there is a taxonomy described with SKOS that is edited by multiple
users where the taxonomy is used as primary navigation on a website.

I want to enable that a user can create a branch they can work on in isolation
(feature branch) before merging their changes back into the devel branch.
The user should be able to preview/stage their changes to see how they would
look on the website.

At a certain point a stable release would be pushed to the master and this would
be used on the production website.

Additionally (and a key requirement) I assume that the entire taxonomy being
managed is an RDF dataset (as opposed to a single RDF graph).

Putting that into Git terms, each commit would be equivalent to a certain state
of the RDF dataset.

I think the big challenge is when merging branches.
How does one identify and resolve conflicts whilst maintaining structural
coherence of the data against some schema.

Your example goes even beyond this where each authority might be thought of as a
remote repository or even a completely separate project.
Each repository/project can then have it's own version history, branches, etc.

Another project might then be created to manage the equivalence links between
all the different projects.

Assuming each (state of an) RDF dataset has a SPARQL endpoint, one could
federate across these to find out what different people think about the same
thing.

Regards,

John Walker
Principal Consultant & co-founder
Semaku B.V.
SFJ 4.009, Torenallee 20, 5617 BC Eindhoven
Mobile: +31 6 475 22030
Email: john.walker@semaku.com
Skype: jaw111

KvK: 58031405
BTW: NL852842156B01
IBAN: NL94 INGB 0008 3219 95



> On October 2, 2014 at 10:02 PM Jürgen Jakobitsch
> <j.jakobitsch@semantic-web.at> wrote:
> 
>  hi,
> 
>  when trying to classify the animals on pictures from a recent trip to eastern
> indonesia
>  meticulously realized that it is very hard if not impossible to branch
> datasets with ease.
>  while this might sound ignoreable at first sight it might as well be the
> reason for the giant global graph to develop a culture of duplicating and
> linking with the end effect of being very close to where we came from (many
> sql databases).
> 
>  what i mean will hopefully become clear with a simple example :
> 
>  the "manta birostris" (giant oceanic manta ray) is classified
> 
>  her<http://wikipedia.org> as
>  Kingdom: Animalia
>  Phylum: Chordata
>  Class: Chondrichthyes
>  Subclass: Elasmobranchii
>  Order: Myliobatiformes
>  Suborder: Myliobatidae
>  Family: Mobulidae
>  Genus: Manta
>  Species: Manta birostris
> 
>  here http://www.catalogueoflife.org/col/browse/tree/id/18879368 as
>  Kingdom: Animalia
>  Phylum: Chordata
>  Class: Elasmobranchii
>  Order: Myliobatiformes
>  Family: Myliobatidae
>  Genus: Manta
>  Species: Manta birostris
> 
>  here http://www.marinespecies.org/aphia.php?p=browser&id=105755#ct
> <http://www.marinespecies.org/aphia.php?p=browser&id=105755#ct> as
>  Kingdom: Animalia
>  Phylum: Chordata
>  Subphylum: Vertebrata
>  Superclass: Gnathostomata
>  Superclass Pisces (Unreviewed)
>  Class: Elasmobranchii (Unreviewed)
>  Subclass: Neoselachii (Unreviewed)
>  Infraclass: Batoidea (Unreviewed)
>  Order: Rajiformes
>  Family: Myliobatidae (Unreviewed)
>  Subfamily: Mobulinae
>  Genus: Manta
>  Species: Manta birostris
> 
>  here http://data.gbif.org/species/2419163/ as
>  Kingdom: Animalia
>  Phylum: Chordata
>  Class: Elasmobranchii
>  Order: Myliobatiformes
>  Family: Myliobatidae
>  Genus: Manta
>  Species: Manta birostris
> 
>  if only in theory we would triplify all these datasets and link them it still
> would be very hard to find out what different people think about the actually
> same being.
> 
>  now:
> 
>  my thinking was to create a flat list of uris for => all <= these
> classifications and create branches (graphs) with the hierarchies. but it is
> not as simple as it sounds because i cannot make the sparql engine follow a
> branch at certain uris and the rejoin the master graph again by whatever
> means. neither can i do such things on data level.
> 
>  i was thinking about like so [1] on a triple (quad) level.
> 
>  questions:
> 
>  1. is the problem described so that it is at least semi-understandable (or
> should i come up with some triples as example)
>  2. has this problem already been dealt with and i was only missing that day
> (please provide a link)
>  3. has this problem already been solved and i was only missing that day
> (please provide a link)
>  4. do you think it is worth dealing with
>      (i personally think so [think: scaling cooperation ])
>  5. would be a of enough interest to create a wg
> 
>  any pointers and thoughts highly appreciated
>  wkr turnguard
> 
> 
>  [1]  http://nvie.com/posts/a-successful-git-branching-model/
> 
>  | Jürgen Jakobitsch,
>  | Software Developer
>  | Semantic Web Company GmbH
>  | Mariahilfer Straße 70 / Neubaugasse 1, Top 8
>  | A - 1070 Wien, Austria
>  | Mob +43 676 62 12 710 | Fax +43.1.402 12 35 - 22
> 
>  COMPANY INFORMATION
>  | web       : http://www.semantic-web.at/
>  | foaf      : http://company.semantic-web.at/person/juergen_jakobitsch
>  PERSONAL INFORMATION
>  | web       : http://www.turnguard.com
>  | foaf      : http://www.turnguard.com/turnguard
>  | g+        : https://plus.google.com/111233759991616358206/posts
>  | skype     : jakobitsch-punkt
>  | xmlns:tg  = " http://www.turnguard.com/turnguard#"
> 

Received on Thursday, 2 October 2014 22:17:23 UTC