Re: dataset branching (a la git) from Jürgen Jakobitsch on 2014-10-02 (public-lod@w3.org from October 2014)

From: Jürgen Jakobitsch <j.jakobitsch@semantic-web.at>
Date: Fri, 3 Oct 2014 00:35:12 +0200
To: public-lod@w3.org
Message-ID: <CAETaefzkAJBHfyE342=qL4A_JZAY3ZZVODXVJzx9KQ42BZufHg@mail.gmail.com>
thank you all for you input (i was just compiling my example when i
retrieved your (adrian) skos-mail.

mybe the solution would be to go down the skos-xl road and create
something like skos-xxl

where i could state that

<http://s.org/a> <http://p.net/label> "europe" .
<http://s.org/b> <http://p.net/label> "central europe" .
<http://s.org/c> <http://p.net/label> "austria" .
<http://s.org/d> <http://p.net/label> "carinthia" .
<http://s.org/e> <http://p.net/label> "klagenfurt" .
<http://s.org/f> <http://p.net/label> "st.martin" .

<http://s.org/a> skos-xxl:narrower _:n1 .
_:n1 rdf:resource <http://s.org/b>;
        taxon:acceptedBy <http://persons.net/a>,<http://persons.net/b>,<
http://persons.net/c>,<http://persons.net/d> .
<http://s.org/ <http://s.org/a>b> skos-xxl:narrower _:n2 .
_:n2 rdf:resource <http://s.org/c>;
        taxon:acceptedBy <http://persons.net/a>,<http://persons.net/b>,<
http://persons.net/c> .

when i think about a typical website usecase the performance impact would
not be dramatic (see for example the taxon tree here [1]).

i'm gonna investigate that road a little...

wkr turnguard

[1] http://www.catalogueoflife.org/col/browse/tree


| Jürgen Jakobitsch,
| Software Developer
| Semantic Web Company GmbH
| Mariahilfer Straße 70 / Neubaugasse 1, Top 8
| A - 1070 Wien, Austria
| Mob +43 676 62 12 710 | Fax +43.1.402 12 35 - 22

COMPANY INFORMATION
| web       : http://www.semantic-web.at/
| foaf      : http://company.semantic-web.at/person/juergen_jakobitsch
PERSONAL INFORMATION
| web       : http://www.turnguard.com
| foaf      : http://www.turnguard.com/turnguard
| g+        : https://plus.google.com/111233759991616358206/posts
| skype     : jakobitsch-punkt
| xmlns:tg  = "http://www.turnguard.com/turnguard#"

2014-10-03 0:19 GMT+02:00 Jürgen Jakobitsch <j.jakobitsch@semantic-web.at>:

> ok - i guess i should come up with an example :
>
> what i want to achieve is for example that people can rewrite part of a
> dataset and be able to get their version of the complete dataset.
>
> i.e. (java code)
>
> i clone a whole repository, change one single line in one java file and
> still be able to compile the whole project.
>
> i.e. (rdf code)
>
> master data (in graph http://graphs.net/master) (a flat list)
>
> <http://s.org/a> <http://p.net/label> "europe" .
> <http://s.org/b> <http://p.net/label> "central europe" .
> <http://s.org/c> <http://p.net/label> "austria" .
> <http://s.org/d> <http://p.net/label> "carinthia" .
> <http://s.org/e> <http://p.net/label> "klagenfurt" .
> <http://s.org/f> <http://p.net/label> "st.martin" .
>
> person A (in graph http://graphs.net/persons/a) (= a branch with a
> hierarchy)
> (note : person A is at time T1 not an expert and doesn't know about
> "carinthia" being an austrian state)
>
> <http://s.org/a> skos:narrower <http://s.org/b> .
> <http://s.org/b> skos:narrower <http://s.org/c> .
> <http://s.org/c> skos:narrower <http://s.org/e> .
>
> person B (in graph http://graphs.net/persons/b) (= a branch with a
> [better] hierarchy)
> (note : person B is an expert on austrian geography and knows about
> "carinthia" being an austrian state)
>
> <http://s.org/a> skos:narrower <http://s.org/b> .
> <http://s.org/b> skos:narrower <http://s.org/c> .
> <http://s.org/c> skos:narrower <http://s.org/d> .
> <http://s.org/d> skos:narrower <http://s.org/e> .
>
> what happend becomes clear when take one step back and realize that all
> the relations (skos:narrower) have been duplicated.
>
> now say person C is a senior expert on the municipalities and boroughs in
> the city of "klagenfurt".
> person C agrees with the graph from person B but wants to extend it. in
> this simple example person => could <=
> simply add triples in http://graphs.net/persons/c beginning with <
> http://s.org/e> skos:narrower <http://s.org/f> .
>
> and i could select do a
>
> SELECT
> FROM  <http://graphs.net/master>
> FROM  <http://graphs.net/persons/b>
> FROM  <http://graphs.net/persons/c>
>
> to get complete and happy result.
>
> now, besides copying triples like
> <http://s.org/a> skos:narrower <http://s.org/b> .
> <http://s.org/b> skos:narrower <http://s.org/c> .
> this example works when appending to the end of the hierarchy.
>
> what you cannot simply do is for example replace a triple in a branch
> (graph)
>
> say person D agrees with person B mostly, only "Central Europe" is no
> political entity and therefor doesn't have to do anything in the hierarchy.
>
> person D could actually only copy the graph and adjust the triples
> accordingly (but that is again copying)
>
>
> now this copying i don't like.
>
> let's come back to the initial example of a biological classification.
> i just triplified the catalogoflife.org downloadable dataset and
> currently have 1775844 entities and with a couple of different opions from
> a couple of different scientists this soon goes into billions of triples.
>
> ;-) i still should think about how express the problem that i see but i
> need to start somewhere and writing such things down really helps
> sometimes..
>
> wkr j
>
>
>
>
>
>
> | Jürgen Jakobitsch,
> | Software Developer
> | Semantic Web Company GmbH
> | Mariahilfer Straße 70 / Neubaugasse 1, Top 8
> | A - 1070 Wien, Austria
> | Mob +43 676 62 12 710 | Fax +43.1.402 12 35 - 22
>
> COMPANY INFORMATION
> | web       : http://www.semantic-web.at/
> | foaf      : http://company.semantic-web.at/person/juergen_jakobitsch
> PERSONAL INFORMATION
> | web       : http://www.turnguard.com
> | foaf      : http://www.turnguard.com/turnguard
> | g+        : https://plus.google.com/111233759991616358206/posts
> | skype     : jakobitsch-punkt
> | xmlns:tg  = "http://www.turnguard.com/turnguard#"
>
> 2014-10-02 23:42 GMT+02:00 Kingsley Idehen <kidehen@openlinksw.com>:
>
>>  On 10/2/14 4:02 PM, Jürgen Jakobitsch wrote:
>>
>> hi,
>>
>>  when trying to classify the animals on pictures from a recent trip to
>> eastern indonesia
>> meticulously realized that it is very hard if not impossible to branch
>> datasets with ease.
>> while this might sound ignoreable at first sight it might as well be the
>> reason for the giant global graph to develop a culture of duplicating and
>> linking with the end effect of being very close to where we came from (many
>> sql databases).
>>
>>  what i mean will hopefully become clear with a simple example :
>>
>> the "manta birostris" (giant oceanic manta ray) is classified
>>
>>  her wikipedia.org as
>> Kingdom: Animalia
>> Phylum: Chordata
>> Class: Chondrichthyes
>> Subclass: Elasmobranchii
>> Order: Myliobatiformes
>> Suborder: Myliobatidae
>> Family: Mobulidae
>> Genus: Manta
>> Species: Manta birostris
>>
>>  here http://www.catalogueoflife.org/col/browse/tree/id/18879368 as
>> Kingdom: Animalia
>> Phylum: Chordata
>> Class: Elasmobranchii
>> Order: Myliobatiformes
>> Family: Myliobatidae
>> Genus: Manta
>> Species: Manta birostris
>>
>>  here http://www.marinespecies.org/aphia.php?p=browser&id=105755#ct as
>> Kingdom: Animalia
>> Phylum: Chordata
>> Subphylum: Vertebrata
>> Superclass: Gnathostomata
>> Superclass Pisces (Unreviewed)
>> Class: Elasmobranchii (Unreviewed)
>> Subclass: Neoselachii (Unreviewed)
>> Infraclass: Batoidea (Unreviewed)
>> Order: Rajiformes
>> Family: Myliobatidae (Unreviewed)
>> Subfamily: Mobulinae
>> Genus: Manta
>> Species: Manta birostris
>>
>>  here http://data.gbif.org/species/2419163/ as
>> Kingdom: Animalia
>> Phylum: Chordata
>> Class: Elasmobranchii
>> Order: Myliobatiformes
>> Family: Myliobatidae
>> Genus: Manta
>> Species: Manta birostris
>>
>>  if only in theory we would triplify all these datasets and link them it
>> still would be very hard to find out what different people think about the
>> actually same being.
>>
>>  now:
>>
>>  my thinking was to create a flat list of uris for => all <= these
>> classifications and create branches (graphs) with the hierarchies. but it
>> is not as simple as it sounds because i cannot make the sparql engine
>> follow a branch at certain uris and the rejoin the master graph again by
>> whatever means.
>>
>>
>> You mean that you can't de-reference a SPARQL query pattern variable as
>> part of a SPARQL query processing pipeline?
>>
>>  neither can i do such things on data level.
>>
>>
>> If the data is in 5-star Linked Open Data form you have the data network
>> in place. Then its about a SPARQL query that crawls the data-network.
>> Ultimately, each entity description document SHOULD end up being an
>> internal triples/quad store document identifier (a/k/a named graph IRI).
>>
>> Naturally, what I describe above is how Virtuoso will behave is you
>> include input:grab pragmas in your SPARQL.
>>
>>
>>  i was thinking about like so [1] on a triple (quad) level.
>>
>>  questions:
>>
>> 1. is the problem described so that it is at least semi-understandable
>> (or should i come up with some triples as example)
>>
>>
>> I think so, but not 100% certain :)
>>
>>  2. has this problem already been dealt with and i was only missing that
>> day (please provide a link)
>>
>>
>> Sorta, in some other conversations about LOD cloud crawling and SPARQL.
>>
>>  3. has this problem already been solved and i was only missing that day
>> (please provide a link)
>> 4. do you think it is worth dealing with
>>     (i personally think so [think: scaling cooperation ])
>> 5. would be a of enough interest to create a wg
>>
>>  any pointers and thoughts highly appreciated
>> wkr turnguard
>>
>>
>>
>> --
>> Regards,
>>
>> Kingsley Idehen 
>> Founder & CEO
>> OpenLink Software
>> Company Web: http://www.openlinksw.com
>> Personal Weblog 1: http://kidehen.blogspot.com
>> Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen
>> Twitter Profile: https://twitter.com/kidehen
>> Google+ Profile: https://plus.google.com/+KingsleyIdehen/about
>> LinkedIn Profile: http://www.linkedin.com/in/kidehen
>> Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this
>>
>>
>
Received on Thursday, 2 October 2014 22:35:40 UTC