Re: triple Indexing for Apps like Cimba from Brent Shambaugh on 2015-01-18 (public-lod@w3.org from January 2015)

From: Brent Shambaugh <brent.shambaugh@gmail.com>
Date: Sun, 18 Jan 2015 12:51:35 -0600
To: "public-lod@w3.org Data" <public-lod@w3.org>, "sematic-web@w3.org" <sematic-web@w3.org>
Cc: Henry Story <henry.story@bblfish.net>, Andrei Sambra <andrei.sambra@gmail.com>
Message-Id: <C38E1AD0-E16A-4700-937E-7939A0B311A2@gmail.com>

I was reading through something I wrote and found that SwarmLinda is not P2P. (see the aggregation of data section, Tolksdorf ref 117) [1], but that could have its benefits.

I think I said it was on the sematic-web list when asking about finding a triple on the web (federated SPARQL in a distributed and decentralized way). Thanks again for your input (I am still going through it).

[1] http://bshambaugh.org/Master_17.html
[2] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.110.2876

Sent from my iPhone
> On Jan 18, 2015, at 11:18 AM, Brent Shambaugh <brent.shambaugh@gmail.com> wrote:
> 
> Andrei (and others in the reply all?),
> 
> Last year you gave a talk about cimba.co at MIT. During the Q&A there was some discussion about what sort of index or triple retrieval mechanism there would be. Sandro Hawke put up the talk, which I linked to here [0]. I was wondering if you came up with something.
> 
> Thanks for your time.
> 
> My thoughts:
> 
> From what I have read, it is difficult to index everything. The best you can do is index triples that are "important"that will eventually lead you to less important triples that you might want. 
> 
> Perhaps this is accomplished by some form of semantic clustering? Perhaps this clustering is accomplished by some sort of distributed RDF store, such as Swarm Linda [1]. Or perhaps this clustering is accomplished by only indexing the names of linked data containers with some sort of description about what they are about. Or perhaps, collections, which seem to have less structure defined about what they are about and can exist (iirc) at multiple Network nodes with different ownership, are described in some way and cleaned up to be more query able using swarm intelligence provided by Swarm Linda, or something similar like building a Folksonomy with Twitter tags [2]. I might need to compare these more, but it seems you are looking at semantic and syntactic similarities where the semantic similarities need some sort of global reference to make things more manageable/possible.                      
> For the index you either need some sort of centralized index or decentralized index. If being a purist in decentralization is desired even YaCy won't do since there are 4 nodes that are not decentralized [3]. Not knowing much, there may be times when you want a centralized index. Perhaps P2P would introduce too much latency and use too much bandwidth in the network. Perhaps sometimes you want P2P because you are constructing a Mesh Network where you might even want local versions of some ontologies because you are closed off for some reason.  
> [0] http://adistributedeconomy.blogspot.com/2014/12/links-to-building-social-applications.html?m=1
> [1] http://www.mi.fu-berlin.de/inf/publications/techreports/tr2009/B-09-04/TR-B-09-04.pdf?1346662692 [2] http://people.kmi.open.ac.uk/motta/papers/SpeciaMotta_ESWC-2007_Final.pdf                                     
> [3] https://fedcsis.org/proceedings/2011/pliks/237.pdf
> 
>

Received on Sunday, 18 January 2015 18:52:10 UTC