Re: dataset branching (a la git)

i meant adrian = john...



| Jürgen Jakobitsch,
| Software Developer
| Semantic Web Company GmbH
| Mariahilfer Straße 70 / Neubaugasse 1, Top 8
| A - 1070 Wien, Austria
| Mob +43 676 62 12 710 | Fax +43.1.402 12 35 - 22

COMPANY INFORMATION
| web       : http://www.semantic-web.at/
| foaf      : http://company.semantic-web.at/person/juergen_jakobitsch
PERSONAL INFORMATION
| web       : http://www.turnguard.com
| foaf      : http://www.turnguard.com/turnguard
| g+        : https://plus.google.com/111233759991616358206/posts
| skype     : jakobitsch-punkt
| xmlns:tg  = "http://www.turnguard.com/turnguard#"

2014-10-03 0:35 GMT+02:00 Jürgen Jakobitsch <j.jakobitsch@semantic-web.at>:

> thank you all for you input (i was just compiling my example when i
> retrieved your (adrian) skos-mail.
>
> mybe the solution would be to go down the skos-xl road and create
> something like skos-xxl
>
> where i could state that
>
> <http://s.org/a> <http://p.net/label> "europe" .
> <http://s.org/b> <http://p.net/label> "central europe" .
> <http://s.org/c> <http://p.net/label> "austria" .
> <http://s.org/d> <http://p.net/label> "carinthia" .
> <http://s.org/e> <http://p.net/label> "klagenfurt" .
> <http://s.org/f> <http://p.net/label> "st.martin" .
>
> <http://s.org/a> skos-xxl:narrower _:n1 .
> _:n1 rdf:resource <http://s.org/b>;
>         taxon:acceptedBy <http://persons.net/a>,<http://persons.net/b>,<
> http://persons.net/c>,<http://persons.net/d> .
> <http://s.org/ <http://s.org/a>b> skos-xxl:narrower _:n2 .
> _:n2 rdf:resource <http://s.org/c>;
>         taxon:acceptedBy <http://persons.net/a>,<http://persons.net/b>,<
> http://persons.net/c> .
>
> when i think about a typical website usecase the performance impact would
> not be dramatic (see for example the taxon tree here [1]).
>
> i'm gonna investigate that road a little...
>
> wkr turnguard
>
> [1] http://www.catalogueoflife.org/col/browse/tree
>
>
> | Jürgen Jakobitsch,
> | Software Developer
> | Semantic Web Company GmbH
> | Mariahilfer Straße 70 / Neubaugasse 1, Top 8
> | A - 1070 Wien, Austria
> | Mob +43 676 62 12 710 | Fax +43.1.402 12 35 - 22
>
> COMPANY INFORMATION
> | web       : http://www.semantic-web.at/
> | foaf      : http://company.semantic-web.at/person/juergen_jakobitsch
> PERSONAL INFORMATION
> | web       : http://www.turnguard.com
> | foaf      : http://www.turnguard.com/turnguard
> | g+        : https://plus.google.com/111233759991616358206/posts
> | skype     : jakobitsch-punkt
> | xmlns:tg  = "http://www.turnguard.com/turnguard#"
>
> 2014-10-03 0:19 GMT+02:00 Jürgen Jakobitsch <j.jakobitsch@semantic-web.at>
> :
>
>> ok - i guess i should come up with an example :
>>
>> what i want to achieve is for example that people can rewrite part of a
>> dataset and be able to get their version of the complete dataset.
>>
>> i.e. (java code)
>>
>> i clone a whole repository, change one single line in one java file and
>> still be able to compile the whole project.
>>
>> i.e. (rdf code)
>>
>> master data (in graph http://graphs.net/master) (a flat list)
>>
>> <http://s.org/a> <http://p.net/label> "europe" .
>> <http://s.org/b> <http://p.net/label> "central europe" .
>> <http://s.org/c> <http://p.net/label> "austria" .
>> <http://s.org/d> <http://p.net/label> "carinthia" .
>> <http://s.org/e> <http://p.net/label> "klagenfurt" .
>> <http://s.org/f> <http://p.net/label> "st.martin" .
>>
>> person A (in graph http://graphs.net/persons/a) (= a branch with a
>> hierarchy)
>> (note : person A is at time T1 not an expert and doesn't know about
>> "carinthia" being an austrian state)
>>
>> <http://s.org/a> skos:narrower <http://s.org/b> .
>> <http://s.org/b> skos:narrower <http://s.org/c> .
>> <http://s.org/c> skos:narrower <http://s.org/e> .
>>
>> person B (in graph http://graphs.net/persons/b) (= a branch with a
>> [better] hierarchy)
>> (note : person B is an expert on austrian geography and knows about
>> "carinthia" being an austrian state)
>>
>> <http://s.org/a> skos:narrower <http://s.org/b> .
>> <http://s.org/b> skos:narrower <http://s.org/c> .
>> <http://s.org/c> skos:narrower <http://s.org/d> .
>> <http://s.org/d> skos:narrower <http://s.org/e> .
>>
>> what happend becomes clear when take one step back and realize that all
>> the relations (skos:narrower) have been duplicated.
>>
>> now say person C is a senior expert on the municipalities and boroughs
>> in the city of "klagenfurt".
>> person C agrees with the graph from person B but wants to extend it. in
>> this simple example person => could <=
>> simply add triples in http://graphs.net/persons/c beginning with <
>> http://s.org/e> skos:narrower <http://s.org/f> .
>>
>> and i could select do a
>>
>> SELECT
>> FROM  <http://graphs.net/master>
>> FROM  <http://graphs.net/persons/b>
>> FROM  <http://graphs.net/persons/c>
>>
>> to get complete and happy result.
>>
>> now, besides copying triples like
>> <http://s.org/a> skos:narrower <http://s.org/b> .
>> <http://s.org/b> skos:narrower <http://s.org/c> .
>> this example works when appending to the end of the hierarchy.
>>
>> what you cannot simply do is for example replace a triple in a branch
>> (graph)
>>
>> say person D agrees with person B mostly, only "Central Europe" is no
>> political entity and therefor doesn't have to do anything in the hierarchy.
>>
>> person D could actually only copy the graph and adjust the triples
>> accordingly (but that is again copying)
>>
>>
>> now this copying i don't like.
>>
>> let's come back to the initial example of a biological classification.
>> i just triplified the catalogoflife.org downloadable dataset and
>> currently have 1775844 entities and with a couple of different opions from
>> a couple of different scientists this soon goes into billions of triples.
>>
>> ;-) i still should think about how express the problem that i see but i
>> need to start somewhere and writing such things down really helps
>> sometimes..
>>
>> wkr j
>>
>>
>>
>>
>>
>>
>> | Jürgen Jakobitsch,
>> | Software Developer
>> | Semantic Web Company GmbH
>> | Mariahilfer Straße 70 / Neubaugasse 1, Top 8
>> | A - 1070 Wien, Austria
>> | Mob +43 676 62 12 710 | Fax +43.1.402 12 35 - 22
>>
>> COMPANY INFORMATION
>> | web       : http://www.semantic-web.at/
>> | foaf      : http://company.semantic-web.at/person/juergen_jakobitsch
>> PERSONAL INFORMATION
>> | web       : http://www.turnguard.com
>> | foaf      : http://www.turnguard.com/turnguard
>> | g+        : https://plus.google.com/111233759991616358206/posts
>> | skype     : jakobitsch-punkt
>> | xmlns:tg  = "http://www.turnguard.com/turnguard#"
>>
>> 2014-10-02 23:42 GMT+02:00 Kingsley Idehen <kidehen@openlinksw.com>:
>>
>>>  On 10/2/14 4:02 PM, Jürgen Jakobitsch wrote:
>>>
>>> hi,
>>>
>>>  when trying to classify the animals on pictures from a recent trip to
>>> eastern indonesia
>>> meticulously realized that it is very hard if not impossible to branch
>>> datasets with ease.
>>> while this might sound ignoreable at first sight it might as well be the
>>> reason for the giant global graph to develop a culture of duplicating and
>>> linking with the end effect of being very close to where we came from (many
>>> sql databases).
>>>
>>>  what i mean will hopefully become clear with a simple example :
>>>
>>> the "manta birostris" (giant oceanic manta ray) is classified
>>>
>>>  her wikipedia.org as
>>> Kingdom: Animalia
>>> Phylum: Chordata
>>> Class: Chondrichthyes
>>> Subclass: Elasmobranchii
>>> Order: Myliobatiformes
>>> Suborder: Myliobatidae
>>> Family: Mobulidae
>>> Genus: Manta
>>> Species: Manta birostris
>>>
>>>  here http://www.catalogueoflife.org/col/browse/tree/id/18879368 as
>>> Kingdom: Animalia
>>> Phylum: Chordata
>>> Class: Elasmobranchii
>>> Order: Myliobatiformes
>>> Family: Myliobatidae
>>> Genus: Manta
>>> Species: Manta birostris
>>>
>>>  here http://www.marinespecies.org/aphia.php?p=browser&id=105755#ct as
>>> Kingdom: Animalia
>>> Phylum: Chordata
>>> Subphylum: Vertebrata
>>> Superclass: Gnathostomata
>>> Superclass Pisces (Unreviewed)
>>> Class: Elasmobranchii (Unreviewed)
>>> Subclass: Neoselachii (Unreviewed)
>>> Infraclass: Batoidea (Unreviewed)
>>> Order: Rajiformes
>>> Family: Myliobatidae (Unreviewed)
>>> Subfamily: Mobulinae
>>> Genus: Manta
>>> Species: Manta birostris
>>>
>>>  here http://data.gbif.org/species/2419163/ as
>>> Kingdom: Animalia
>>> Phylum: Chordata
>>> Class: Elasmobranchii
>>> Order: Myliobatiformes
>>> Family: Myliobatidae
>>> Genus: Manta
>>> Species: Manta birostris
>>>
>>>  if only in theory we would triplify all these datasets and link them
>>> it still would be very hard to find out what different people think about
>>> the actually same being.
>>>
>>>  now:
>>>
>>>  my thinking was to create a flat list of uris for => all <= these
>>> classifications and create branches (graphs) with the hierarchies. but it
>>> is not as simple as it sounds because i cannot make the sparql engine
>>> follow a branch at certain uris and the rejoin the master graph again by
>>> whatever means.
>>>
>>>
>>> You mean that you can't de-reference a SPARQL query pattern variable as
>>> part of a SPARQL query processing pipeline?
>>>
>>>  neither can i do such things on data level.
>>>
>>>
>>> If the data is in 5-star Linked Open Data form you have the data network
>>> in place. Then its about a SPARQL query that crawls the data-network.
>>> Ultimately, each entity description document SHOULD end up being an
>>> internal triples/quad store document identifier (a/k/a named graph IRI).
>>>
>>> Naturally, what I describe above is how Virtuoso will behave is you
>>> include input:grab pragmas in your SPARQL.
>>>
>>>
>>>  i was thinking about like so [1] on a triple (quad) level.
>>>
>>>  questions:
>>>
>>> 1. is the problem described so that it is at least semi-understandable
>>> (or should i come up with some triples as example)
>>>
>>>
>>> I think so, but not 100% certain :)
>>>
>>>  2. has this problem already been dealt with and i was only missing
>>> that day (please provide a link)
>>>
>>>
>>> Sorta, in some other conversations about LOD cloud crawling and SPARQL.
>>>
>>>  3. has this problem already been solved and i was only missing that
>>> day (please provide a link)
>>> 4. do you think it is worth dealing with
>>>     (i personally think so [think: scaling cooperation ])
>>> 5. would be a of enough interest to create a wg
>>>
>>>  any pointers and thoughts highly appreciated
>>> wkr turnguard
>>>
>>>
>>>
>>> --
>>> Regards,
>>>
>>> Kingsley Idehen 
>>> Founder & CEO
>>> OpenLink Software
>>> Company Web: http://www.openlinksw.com
>>> Personal Weblog 1: http://kidehen.blogspot.com
>>> Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen
>>> Twitter Profile: https://twitter.com/kidehen
>>> Google+ Profile: https://plus.google.com/+KingsleyIdehen/about
>>> LinkedIn Profile: http://www.linkedin.com/in/kidehen
>>> Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this
>>>
>>>
>>
>

Received on Thursday, 2 October 2014 22:40:25 UTC