Re: dataset branching (a la git)

On 10/2/14 4:02 PM, Jürgen Jakobitsch wrote:
> hi,
>
> when trying to classify the animals on pictures from a recent trip to 
> eastern indonesia
> meticulously realized that it is very hard if not impossible to branch 
> datasets with ease.
> while this might sound ignoreable at first sight it might as well be 
> the reason for the giant global graph to develop a culture of 
> duplicating and linking with the end effect of being very close to 
> where we came from (many sql databases).
>
> what i mean will hopefully become clear with a simple example :
>
> the "manta birostris" (giant oceanic manta ray) is classified
>
> her wikipedia.org <http://wikipedia.org> as
> Kingdom:Animalia
> Phylum:Chordata
> Class:Chondrichthyes
> Subclass:Elasmobranchii
> Order:Myliobatiformes
> Suborder:Myliobatidae
> Family:Mobulidae
> Genus:Manta
> Species:Manta birostris
>
> here http://www.catalogueoflife.org/col/browse/tree/id/18879368 as
> Kingdom: Animalia
> Phylum: Chordata
> Class: Elasmobranchii
> Order: Myliobatiformes
> Family: Myliobatidae
> Genus: Manta
> Species: Manta birostris
>
> here http://www.marinespecies.org/aphia.php?p=browser&id=105755#ct as
> Kingdom: Animalia
> Phylum: Chordata
> Subphylum: Vertebrata
> Superclass: Gnathostomata
> Superclass Pisces (Unreviewed)
> Class: Elasmobranchii (Unreviewed)
> Subclass: Neoselachii (Unreviewed)
> Infraclass: Batoidea (Unreviewed)
> Order: Rajiformes
> Family: Myliobatidae (Unreviewed)
> Subfamily: Mobulinae
> Genus: Manta
> Species: Manta birostris
>
> here http://data.gbif.org/species/2419163/ as
> Kingdom: Animalia
> Phylum: Chordata
> Class: Elasmobranchii
> Order: Myliobatiformes
> Family: Myliobatidae
> Genus: Manta
> Species: Manta birostris
>
> if only in theory we would triplify all these datasets and link them 
> it still would be very hard to find out what different people think 
> about the actually same being.
>
> now:
>
> my thinking was to create a flat list of uris for => all <= these 
> classifications and create branches (graphs) with the hierarchies. but 
> it is not as simple as it sounds because i cannot make the sparql 
> engine follow a branch at certain uris and the rejoin the master graph 
> again by whatever means.

You mean that you can't de-reference a SPARQL query pattern variable as 
part of a SPARQL query processing pipeline?

> neither can i do such things on data level.

If the data is in 5-star Linked Open Data form you have the data network 
in place. Then its about a SPARQL query that crawls the data-network. 
Ultimately, each entity description document SHOULD end up being an 
internal triples/quad store document identifier (a/k/a named graph IRI).

Naturally, what I describe above is how Virtuoso will behave is you 
include input:grab pragmas in your SPARQL.
>
> i was thinking about like so [1] on a triple (quad) level.
>
> questions:
>
> 1. is the problem described so that it is at least semi-understandable 
> (or should i come up with some triples as example)

I think so, but not 100% certain :)

> 2. has this problem already been dealt with and i was only missing 
> that day (please provide a link)

Sorta, in some other conversations about LOD cloud crawling and SPARQL.

> 3. has this problem already been solved and i was only missing that 
> day (please provide a link)
> 4. do you think it is worth dealing with
>     (i personally think so [think: scaling cooperation ])
> 5. would be a of enough interest to create a wg
>
> any pointers and thoughts highly appreciated
> wkr turnguard
>


-- 
Regards,

Kingsley Idehen 
Founder & CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog 1: http://kidehen.blogspot.com
Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen
Twitter Profile: https://twitter.com/kidehen
Google+ Profile: https://plus.google.com/+KingsleyIdehen/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen
Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this

Received on Thursday, 2 October 2014 21:43:19 UTC