Re: triple Indexing for Apps like Cimba from Timothy Holborn on 2015-01-18 (public-lod@w3.org from January 2015)

From: Timothy Holborn <timothy.holborn@gmail.com>
Date: Mon, 19 Jan 2015 08:20:49 +1100
To: Brent Shambaugh <brent.shambaugh@gmail.com>, Melvin Carvalho <melvincarvalho@gmail.com>
Cc: Andrei Sambra <andrei.sambra@gmail.com>, Henry Story <henry.story@bblfish.net>, "public-lod@w3.org Data" <public-lod@w3.org>, "public-webid@w3.org" <public-webid@w3.org>, Read-Write-Web <public-rww@w3.org>, Kingsley Idehen <kidehen@openlinksw.com>, Alexandre Bertails <bertails@w3.org>, Joe Presbrey <presbrey@gmail.com>, Tim Berners-Lee <timbl@w3.org>
Message-ID: <CAM1Sok0kmTMK0Mr42uMhZ3z5=OrVwg9+TFQp9M=aYxD28u=_4Q@mail.gmail.com>

Interesting..

I ponder the use of DHT perhaps, yet not sure about the likely size...

Webizen is a service[0]/repo[1]

Assuming RWW Clustering Accounts (ie: provider / subdomains, et.al),
perhaps the base-install uses a look-up service, which is pointed, like a
time-server...?  no-point decentralising on an account level.

Equally, one might consider that the server would index it's own record,
and perhaps a relationship graph out to an  var. int.

Melvin's been dealing with decentralised block-chain storage.  I imagine
this is a similar challenge.

[0] http://webizen.org/
[1] https://github.com/linkeddata/webizen

Tim.H.

On 19 January 2015 at 04:18, Brent Shambaugh <brent.shambaugh@gmail.com>
wrote:

> Andrei (and others in the reply all?),
>
> Last year you gave a talk about cimba.co at MIT. During the Q&A there was
> some discussion about what sort of index or triple retrieval mechanism
> there would be. Sandro Hawke put up the talk, which I linked to here [0]. I
> was wondering if you came up with something.
>
> Thanks for your time.
>
> My thoughts:
>
> From what I have read, it is difficult to index everything. The best you
> can do is index triples that are "important"that will eventually lead you
> to less important triples that you might want.
>
> Perhaps this is accomplished by some form of semantic clustering? Perhaps
> this clustering is accomplished by some sort of distributed RDF store, such
> as Swarm Linda [1]. Or perhaps this clustering is accomplished by only
> indexing the names of linked data containers with some sort of description
> about what they are about. Or perhaps, collections, which seem to have less
> structure defined about what they are about and can exist (iirc) at
> multiple Network nodes with different ownership, are described in some way
> and cleaned up to be more query able using swarm intelligence provided by
> Swarm Linda, or something similar like building a Folksonomy with Twitter
> tags [2]. I might need to compare these more, but it seems you are looking
> at semantic and syntactic similarities where the semantic similarities need
> some sort of global reference to make things more manageable/possible.
> For the index you either need some sort of centralized index or
> decentralized index. If being a purist in decentralization is desired even
> YaCy won't do since there are 4 nodes that are not decentralized [3]. Not
> knowing much, there may be times when you want a centralized index. Perhaps
> P2P would introduce too much latency and use too much bandwidth in the
> network. Perhaps sometimes you want P2P because you are constructing a Mesh
> Network where you might even want local versions of some ontologies because
> you are closed off for some reason.
> [0]
> http://adistributedeconomy.blogspot.com/2014/12/links-to-building-social-applications.html?m=1
> [1]
> http://www.mi.fu-berlin.de/inf/publications/techreports/tr2009/B-09-04/TR-B-09-04.pdf?1346662692
> [2]
> http://people.kmi.open.ac.uk/motta/papers/SpeciaMotta_ESWC-2007_Final.pdf
> [3] https://fedcsis.org/proceedings/2011/pliks/237.pdf
>
>
>
>

Received on Sunday, 18 January 2015 21:21:18 UTC