- From: Brent Shambaugh <brent.shambaugh@gmail.com>
- Date: Sun, 18 Jan 2015 11:18:11 -0600
- To: Andrei Sambra <andrei.sambra@gmail.com>
- Cc: Henry Story <henry.story@bblfish.net>, "public-lod@w3.org Data" <public-lod@w3.org>, "public-webid@w3.org" <public-webid@w3.org>, Read-Write-Web <public-rww@w3.org>, Kingsley Idehen <kidehen@openlinksw.com>, Alexandre Bertails <bertails@w3.org>, Joe Presbrey <presbrey@gmail.com>, Tim Berners-Lee <timbl@w3.org>
Andrei (and others in the reply all?), Last year you gave a talk about cimba.co at MIT. During the Q&A there was some discussion about what sort of index or triple retrieval mechanism there would be. Sandro Hawke put up the talk, which I linked to here [0]. I was wondering if you came up with something. Thanks for your time. My thoughts: From what I have read, it is difficult to index everything. The best you can do is index triples that are "important"that will eventually lead you to less important triples that you might want. Perhaps this is accomplished by some form of semantic clustering? Perhaps this clustering is accomplished by some sort of distributed RDF store, such as Swarm Linda [1]. Or perhaps this clustering is accomplished by only indexing the names of linked data containers with some sort of description about what they are about. Or perhaps, collections, which seem to have less structure defined about what they are about and can exist (iirc) at multiple Network nodes with different ownership, are described in some way and cleaned up to be more query able using swarm intelligence provided by Swarm Linda, or something similar like building a Folksonomy with Twitter tags [2]. I might need to compare these more, but it seems you are looking at semantic and syntactic similarities where the semantic similarities need some sort of global reference to make things more manageable/possible. For the index you either need some sort of centralized index or decentralized index. If being a purist in decentralization is desired even YaCy won't do since there are 4 nodes that are not decentralized [3]. Not knowing much, there may be times when you want a centralized index. Perhaps P2P would introduce too much latency and use too much bandwidth in the network. Perhaps sometimes you want P2P because you are constructing a Mesh Network where you might even want local versions of some ontologies because you are closed off for some reason. [0] http://adistributedeconomy.blogspot.com/2014/12/links-to-building-social-applications.html?m=1 [1] http://www.mi.fu-berlin.de/inf/publications/techreports/tr2009/B-09-04/TR-B-09-04.pdf?1346662692 [2] http://people.kmi.open.ac.uk/motta/papers/SpeciaMotta_ESWC-2007_Final.pdf [3] https://fedcsis.org/proceedings/2011/pliks/237.pdf
Received on Sunday, 18 January 2015 17:18:47 UTC