Re: dataset branching (a la git) from Jürgen Jakobitsch on 2014-10-02 (public-lod@w3.org from October 2014)

From: Jürgen Jakobitsch <j.jakobitsch@semantic-web.at>
Date: Fri, 3 Oct 2014 00:19:47 +0200
To: Kingsley Idehen <kidehen@openlinksw.com>
Cc: public-lod@w3.org
Message-ID: <CAETaefwH3N+-kPPCbjeu+5VGtTRD5jnv=00Em1T9_xENqBgdDA@mail.gmail.com>
ok - i guess i should come up with an example :

what i want to achieve is for example that people can rewrite part of a
dataset and be able to get their version of the complete dataset.

i.e. (java code)

i clone a whole repository, change one single line in one java file and
still be able to compile the whole project.

i.e. (rdf code)

master data (in graph http://graphs.net/master) (a flat list)

<http://s.org/a> <http://p.net/label> "europe" .
<http://s.org/b> <http://p.net/label> "central europe" .
<http://s.org/c> <http://p.net/label> "austria" .
<http://s.org/d> <http://p.net/label> "carinthia" .
<http://s.org/e> <http://p.net/label> "klagenfurt" .
<http://s.org/f> <http://p.net/label> "st.martin" .

person A (in graph http://graphs.net/persons/a) (= a branch with a
hierarchy)
(note : person A is at time T1 not an expert and doesn't know about
"carinthia" being an austrian state)

<http://s.org/a> skos:narrower <http://s.org/b> .
<http://s.org/b> skos:narrower <http://s.org/c> .
<http://s.org/c> skos:narrower <http://s.org/e> .

person B (in graph http://graphs.net/persons/b) (= a branch with a [better]
hierarchy)
(note : person B is an expert on austrian geography and knows about
"carinthia" being an austrian state)

<http://s.org/a> skos:narrower <http://s.org/b> .
<http://s.org/b> skos:narrower <http://s.org/c> .
<http://s.org/c> skos:narrower <http://s.org/d> .
<http://s.org/d> skos:narrower <http://s.org/e> .

what happend becomes clear when take one step back and realize that all the
relations (skos:narrower) have been duplicated.

now say person C is a senior expert on the municipalities and boroughs in
the city of "klagenfurt".
person C agrees with the graph from person B but wants to extend it. in
this simple example person => could <=
simply add triples in http://graphs.net/persons/c beginning with <
http://s.org/e> skos:narrower <http://s.org/f> .

and i could select do a

SELECT
FROM  <http://graphs.net/master>
FROM  <http://graphs.net/persons/b>
FROM  <http://graphs.net/persons/c>

to get complete and happy result.

now, besides copying triples like
<http://s.org/a> skos:narrower <http://s.org/b> .
<http://s.org/b> skos:narrower <http://s.org/c> .
this example works when appending to the end of the hierarchy.

what you cannot simply do is for example replace a triple in a branch
(graph)

say person D agrees with person B mostly, only "Central Europe" is no
political entity and therefor doesn't have to do anything in the hierarchy.

person D could actually only copy the graph and adjust the triples
accordingly (but that is again copying)


now this copying i don't like.

let's come back to the initial example of a biological classification.
i just triplified the catalogoflife.org downloadable dataset and currently
have 1775844 entities and with a couple of different opions from
a couple of different scientists this soon goes into billions of triples.

;-) i still should think about how express the problem that i see but i
need to start somewhere and writing such things down really helps
sometimes..

wkr j






| Jürgen Jakobitsch,
| Software Developer
| Semantic Web Company GmbH
| Mariahilfer Straße 70 / Neubaugasse 1, Top 8
| A - 1070 Wien, Austria
| Mob +43 676 62 12 710 | Fax +43.1.402 12 35 - 22

COMPANY INFORMATION
| web       : http://www.semantic-web.at/
| foaf      : http://company.semantic-web.at/person/juergen_jakobitsch
PERSONAL INFORMATION
| web       : http://www.turnguard.com
| foaf      : http://www.turnguard.com/turnguard
| g+        : https://plus.google.com/111233759991616358206/posts
| skype     : jakobitsch-punkt
| xmlns:tg  = "http://www.turnguard.com/turnguard#"

2014-10-02 23:42 GMT+02:00 Kingsley Idehen <kidehen@openlinksw.com>:

>  On 10/2/14 4:02 PM, Jürgen Jakobitsch wrote:
>
> hi,
>
>  when trying to classify the animals on pictures from a recent trip to
> eastern indonesia
> meticulously realized that it is very hard if not impossible to branch
> datasets with ease.
> while this might sound ignoreable at first sight it might as well be the
> reason for the giant global graph to develop a culture of duplicating and
> linking with the end effect of being very close to where we came from (many
> sql databases).
>
>  what i mean will hopefully become clear with a simple example :
>
> the "manta birostris" (giant oceanic manta ray) is classified
>
>  her wikipedia.org as
> Kingdom: Animalia
> Phylum: Chordata
> Class: Chondrichthyes
> Subclass: Elasmobranchii
> Order: Myliobatiformes
> Suborder: Myliobatidae
> Family: Mobulidae
> Genus: Manta
> Species: Manta birostris
>
>  here http://www.catalogueoflife.org/col/browse/tree/id/18879368 as
> Kingdom: Animalia
> Phylum: Chordata
> Class: Elasmobranchii
> Order: Myliobatiformes
> Family: Myliobatidae
> Genus: Manta
> Species: Manta birostris
>
>  here http://www.marinespecies.org/aphia.php?p=browser&id=105755#ct as
> Kingdom: Animalia
> Phylum: Chordata
> Subphylum: Vertebrata
> Superclass: Gnathostomata
> Superclass Pisces (Unreviewed)
> Class: Elasmobranchii (Unreviewed)
> Subclass: Neoselachii (Unreviewed)
> Infraclass: Batoidea (Unreviewed)
> Order: Rajiformes
> Family: Myliobatidae (Unreviewed)
> Subfamily: Mobulinae
> Genus: Manta
> Species: Manta birostris
>
>  here http://data.gbif.org/species/2419163/ as
> Kingdom: Animalia
> Phylum: Chordata
> Class: Elasmobranchii
> Order: Myliobatiformes
> Family: Myliobatidae
> Genus: Manta
> Species: Manta birostris
>
>  if only in theory we would triplify all these datasets and link them it
> still would be very hard to find out what different people think about the
> actually same being.
>
>  now:
>
>  my thinking was to create a flat list of uris for => all <= these
> classifications and create branches (graphs) with the hierarchies. but it
> is not as simple as it sounds because i cannot make the sparql engine
> follow a branch at certain uris and the rejoin the master graph again by
> whatever means.
>
>
> You mean that you can't de-reference a SPARQL query pattern variable as
> part of a SPARQL query processing pipeline?
>
>  neither can i do such things on data level.
>
>
> If the data is in 5-star Linked Open Data form you have the data network
> in place. Then its about a SPARQL query that crawls the data-network.
> Ultimately, each entity description document SHOULD end up being an
> internal triples/quad store document identifier (a/k/a named graph IRI).
>
> Naturally, what I describe above is how Virtuoso will behave is you
> include input:grab pragmas in your SPARQL.
>
>
>  i was thinking about like so [1] on a triple (quad) level.
>
>  questions:
>
> 1. is the problem described so that it is at least semi-understandable (or
> should i come up with some triples as example)
>
>
> I think so, but not 100% certain :)
>
>  2. has this problem already been dealt with and i was only missing that
> day (please provide a link)
>
>
> Sorta, in some other conversations about LOD cloud crawling and SPARQL.
>
>  3. has this problem already been solved and i was only missing that day
> (please provide a link)
> 4. do you think it is worth dealing with
>     (i personally think so [think: scaling cooperation ])
> 5. would be a of enough interest to create a wg
>
>  any pointers and thoughts highly appreciated
> wkr turnguard
>
>
>
> --
> Regards,
>
> Kingsley Idehen 
> Founder & CEO
> OpenLink Software
> Company Web: http://www.openlinksw.com
> Personal Weblog 1: http://kidehen.blogspot.com
> Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen
> Twitter Profile: https://twitter.com/kidehen
> Google+ Profile: https://plus.google.com/+KingsleyIdehen/about
> LinkedIn Profile: http://www.linkedin.com/in/kidehen
> Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this
>
>
Received on Thursday, 2 October 2014 22:20:15 UTC