Re: dataset branching (a la git) from Kingsley Idehen on 2014-10-02 (public-lod@w3.org from October 2014)

From: Kingsley Idehen <kidehen@openlinksw.com>
Date: Thu, 02 Oct 2014 19:02:03 -0400
To: Jürgen Jakobitsch <j.jakobitsch@semantic-web.at>
CC: public-lod@w3.org
Message-ID: <542DD96B.7080009@openlinksw.com>
On 10/2/14 6:19 PM, Jürgen Jakobitsch wrote:
> ok - i guess i should come up with an example :
>
> what i want to achieve is for example that people can rewrite part of 
> a dataset and be able to get their version of the complete dataset.

Okay.

>
> i.e. (java code)
>
> i clone a whole repository, change one single line in one java file 
> and still be able to compile the whole project.
>
> i.e. (rdf code)
>
> master data (in graph http://graphs.net/master) (a flat list)
>
> <http://s.org/a> <http://p.net/label> "europe" .
> <http://s.org/b> <http://p.net/label> "central europe" .
> <http://s.org/c> <http://p.net/label> "austria" .
> <http://s.org/d> <http://p.net/label> "carinthia" .
> <http://s.org/e> <http://p.net/label> "klagenfurt" .
> <http://s.org/f> <http://p.net/label> "st.martin" .

Nanotation [1] markers for generating sample data from this post, if 
required further on in the discussion.

## Nanotation Start ##

</document1>
<#europe> <#label> "europe" .
<#centralEurope> <#label> "central europe" .
<#austria> <#label> "austria" .
<#carinthia> <#label> "carinthia" .
<#klagenfurt> <#label> "klagenfurt" .
<#stMarting> <#label> "st.martin" .

## Nanotation End ##
>
> person A (in graph http://graphs.net/persons/a) (= a branch with a 
> hierarchy)
> (note : person A is at time T1 not an expert and doesn't know about 
> "carinthia" being an austrian state)
>
> <http://s.org/a> skos:narrower <http://s.org/b> .
> <http://s.org/b> skos:narrower <http://s.org/c> .
> <http://s.org/c> skos:narrower <http://s.org/e> .
>
## Nanotation Start ##

</document2>
<#europe> skos:narrower <#uk>.
<#centralEurope> skos:narrower <#bulgaria> .
<#austria> skos:narrower <#vienna> .

## Nanotation End ##
> person B (in graph http://graphs.net/persons/b) (= a branch with a 
> [better] hierarchy)
> (note : person B is an expert on austrian geography and knows about 
> "carinthia" being an austrian state)
>
> <http://s.org/a> skos:narrower <http://s.org/b> .
> <http://s.org/b> skos:narrower <http://s.org/c> .
> <http://s.org/c> skos:narrower <http://s.org/d> .
> <http://s.org/d> skos:narrower <http://s.org/e> .
>

## Nanotation Start ##
## Vienna and Carinthia conflict

</document3>
<#austria> skos:narrower <#carinthia> .

## Nanotation End ##
> what happend becomes clear when take one step back and realize that 
> all the relations (skos:narrower) have been duplicated.
>
> now say person C is a senior expert on the municipalities andboroughs 
> in the city of "klagenfurt".
> person C agrees with the graph from person B but wants to extend it. 
> in this simple example person => could <=
> simply add triples in http://graphs.net/persons/c beginning with 
> <http://s.org/e> skos:narrower <http://s.org/f> .
>
> and i could select do a
>
> SELECT
> FROM  <http://graphs.net/master>
> FROM  <http://graphs.net/persons/b>
> FROM  <http://graphs.net/persons/c>
>
> to get complete and happy result.

SELECT *
FROM </document1>
FROM </document2>
FROM </document3>
WHERE { ?s ?p ?o .
              VALUES
              FILTER (NOT EXISTS {<#austria> skos:narrower <#vienna> } )
            }

OR

## Using NOT FROM extension we've implemented

SELECT *
NOT FROM </document2>
WHERE { ?s ?p ?o . }


There are other options.

>
> now, besides copying triples like
> <http://s.org/a> skos:narrower <http://s.org/b> .
> <http://s.org/b> skos:narrower <http://s.org/c> .
> this example works when appending to the end of the hierarchy.
>
> what you cannot simply do is for example replace a triple in a branch 
> (graph)

But you can filter out a named graph.

Of course there's more, I could even generate live data from  the 
Nanotations embedded in this post, but that's a last resort. I have a 
like example of triples created via nanotation laced tweets that might 
demonstrate this shuffling in and out of named graphs used in a SPARQL 
processing pipeline [2][3][4][5][6][7].

>
> say person D agrees with person B mostly, only "Central Europe" is no 
> political entity and therefor doesn't have to do anything in the 
> hierarchy.
>
> person D could actually only copy the graph and adjust the triples 
> accordingly (but that is again copying)
>
>
> now this copying i don't like.
>
> let's come back to the initial example of a biological classification.
> i just triplified the catalogoflife.org <http://catalogoflife.org> 
> downloadable dataset and currently have 1775844 entities and with a 
> couple of different opions from
> a couple of different scientists this soon goes into billions of triples.
>
> ;-) i still should think about how express the problem that i see but 
> i need to start somewhere and writing such things down really helps 
> sometimes..
>
> wkr j
>
>

Hopefully, this illustrates your fundamental quest?

Links:

[1] http://bit.ly/blog-post-about-nanotation
[2] http://linkeddata.uriburner.com/c/9GDYGU3 -- Everything
[3] 
http://linkeddata.uriburner.com/fct/rdfdesc/usage.vsp?g=https%3A%2F%2Ftwitter.com%2Fhashtag%2FNoSilo%23this 
-- all the named graphs contributing to the SPARQL solution behind this page
[4] http://linkeddata.uriburner.com/c/9CJLOKIL -- same page with a 
specific named graph (internal document DB id/name) designated as the 
data source
[5] 
http://linkeddata.uriburner.com/fct/rdfdesc/usage.vsp?g=https%3A%2F%2Ftwitter.com%2Fhashtag%2FNoSilo%23this 
-- shows the designated named graph data source (hatched in the UI)
[6] 
http://linkeddata.uriburner.com/fct/rdfdesc/usage.vsp?g=https%3A%2F%2Ftwitter.com%2Fhashtag%2FNoSilo%23this 
-- two named graphs specifically designated as data sources
[7] http://linkeddata.uriburner.com/c/9CT5GRUZ -- effect of the two 
named graphs specifically designated as data sources  .

Kingsley


> 2014-10-02 23:42 GMT+02:00 Kingsley Idehen <kidehen@openlinksw.com 
> <mailto:kidehen@openlinksw.com>>:
>
>     On 10/2/14 4:02 PM, Jürgen Jakobitsch wrote:
>>     hi,
>>
>>     when trying to classify the animals on pictures from a recent
>>     trip to eastern indonesia
>>     meticulously realized that it is very hard if not impossible to
>>     branch datasets with ease.
>>     while this might sound ignoreable at first sight it might as well
>>     be the reason for the giant global graph to develop a culture of
>>     duplicating and linking with the end effect of being very close
>>     to where we came from (many sql databases).
>>
>>     what i mean will hopefully become clear with a simple example :
>>
>>     the "manta birostris" (giant oceanic manta ray) is classified
>>
>>     her wikipedia.org <http://wikipedia.org> as
>>     Kingdom:Animalia
>>     Phylum:Chordata
>>     Class:Chondrichthyes
>>     Subclass:Elasmobranchii
>>     Order:Myliobatiformes
>>     Suborder:Myliobatidae
>>     Family:Mobulidae
>>     Genus:Manta
>>     Species:Manta birostris
>>
>>     here http://www.catalogueoflife.org/col/browse/tree/id/18879368 as
>>     Kingdom: Animalia
>>     Phylum: Chordata
>>     Class: Elasmobranchii
>>     Order: Myliobatiformes
>>     Family: Myliobatidae
>>     Genus: Manta
>>     Species: Manta birostris
>>
>>     here http://www.marinespecies.org/aphia.php?p=browser&id=105755#ct as
>>     Kingdom: Animalia
>>     Phylum: Chordata
>>     Subphylum: Vertebrata
>>     Superclass: Gnathostomata
>>     Superclass Pisces (Unreviewed)
>>     Class: Elasmobranchii (Unreviewed)
>>     Subclass: Neoselachii (Unreviewed)
>>     Infraclass: Batoidea (Unreviewed)
>>     Order: Rajiformes
>>     Family: Myliobatidae (Unreviewed)
>>     Subfamily: Mobulinae
>>     Genus: Manta
>>     Species: Manta birostris
>>
>>     here http://data.gbif.org/species/2419163/ as
>>     Kingdom: Animalia
>>     Phylum: Chordata
>>     Class: Elasmobranchii
>>     Order: Myliobatiformes
>>     Family: Myliobatidae
>>     Genus: Manta
>>     Species: Manta birostris
>>
>>     if only in theory we would triplify all these datasets and link
>>     them it still would be very hard to find out what different
>>     people think about the actually same being.
>>
>>     now:
>>
>>     my thinking was to create a flat list of uris for => all <= these
>>     classifications and create branches (graphs) with the
>>     hierarchies. but it is not as simple as it sounds because i
>>     cannot make the sparql engine follow a branch at certain uris and
>>     the rejoin the master graph again by whatever means.
>
>     You mean that you can't de-reference a SPARQL query pattern
>     variable as part of a SPARQL query processing pipeline?
>
>>     neither can i do such things on data level.
>
>     If the data is in 5-star Linked Open Data form you have the data
>     network in place. Then its about a SPARQL query that crawls the
>     data-network. Ultimately, each entity description document SHOULD
>     end up being an internal triples/quad store document identifier
>     (a/k/a named graph IRI).
>
>     Naturally, what I describe above is how Virtuoso will behave is
>     you include input:grab pragmas in your SPARQL.
>>
>>     i was thinking about like so [1] on a triple (quad) level.
>>
>>     questions:
>>
>>     1. is the problem described so that it is at least
>>     semi-understandable (or should i come up with some triples as
>>     example)
>
>     I think so, but not 100% certain :)
>
>>     2. has this problem already been dealt with and i was only
>>     missing that day (please provide a link)
>
>     Sorta, in some other conversations about LOD cloud crawling and
>     SPARQL.
>
>>     3. has this problem already been solved and i was only missing
>>     that day (please provide a link)
>>     4. do you think it is worth dealing with
>>         (i personally think so [think: scaling cooperation ])
>>     5. would be a of enough interest to create a wg
>>
>>     any pointers and thoughts highly appreciated
>>     wkr turnguard
>>
>
>
>     -- 
>     Regards,
>
>     Kingsley Idehen 
>     Founder & CEO
>     OpenLink Software
>     Company Web:http://www.openlinksw.com
>     Personal Weblog 1:http://kidehen.blogspot.com
>     Personal Weblog 2:http://www.openlinksw.com/blog/~kidehen  <http://www.openlinksw.com/blog/%7Ekidehen>
>     Twitter Profile:https://twitter.com/kidehen
>     Google+ Profile:https://plus.google.com/+KingsleyIdehen/about
>     LinkedIn Profile:http://www.linkedin.com/in/kidehen
>     Personal WebID:http://kingsley.idehen.net/dataspace/person/kidehen#this
>
>


-- 
Regards,

Kingsley Idehen 
Founder & CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog 1: http://kidehen.blogspot.com
Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen
Twitter Profile: https://twitter.com/kidehen
Google+ Profile: https://plus.google.com/+KingsleyIdehen/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen
Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this
Attachments

application/pkcs7-signature attachment: S/MIME Cryptographic Signature
Received on Thursday, 2 October 2014 23:02:28 UTC