- From: Brent Shambaugh <brent.shambaugh@gmail.com>
- Date: Tue, 7 Apr 2015 18:52:39 -0500
- To: Adrian Hope-Bailie <adrian@hopebailie.com>
- Cc: David Nicol <davidnicol@gmail.com>, Melvin Carvalho <melvincarvalho@gmail.com>, Anders Rundgren <anders.rundgren.net@gmail.com>, Web Payments <public-webpayments@w3.org>
- Message-ID: <CACvcBVpNfp6ieRTRWr-BrhKcPV8hh-aX9-z09aAXM95QQ0Enxw@mail.gmail.com>
Sebastian Koske's Thesis may be found here: http://www.mi.fu-berlin.de/inf/publications/techreports/tr2009/B-09-04/TR-B-09-04.pdf?1346662692 (Swarm Approaches for Semantic Triple Clustering and Retrieval in Distributed RDF-Spaces" -Brent Shambaugh Website: bshambaugh.org On Tue, Apr 7, 2015 at 6:49 PM, Brent Shambaugh <brent.shambaugh@gmail.com> wrote: > > > -Brent Shambaugh > > Website: bshambaugh.org > > On Tue, Apr 7, 2015 at 5:27 PM, Adrian Hope-Bailie <adrian@hopebailie.com> > wrote: > >> I don't think availability of suitable technology is the problem. >> There are numerous options and numerous deployments of these. >> That is exactly the problem. >> >> A discovery protocol must either pick one datastore or pick many >> datastores and search them all. >> If it searches many of these datastores for the data it is trying to find >> what order does it follow and does it stop when it finds it's first match >> or does it search them all and then have some rules for picking the most >> correct match? >> >> These are hard problems which today are glossed over by the >> recommendation to "use telehash". >> >> Any clever ideas about how this can be overcome? >> > > I haven't implemented these sorts of things. What immediately comes to > mind: > > (1) swarm intelligence and > > (2) taking advantage of the semantic nature of the data for clustering. > > However, I will depart from this for a second. > > Telehash adapts the Kadmelia DHT. According to Wikipedia, "Kademlia uses a > "distance" calculation between two nodes. This distance is computed as the exclusive > or <http://en.wikipedia.org/wiki/Exclusive_or> of the two node IDs, > taking the result as an integer number > <http://en.wikipedia.org/wiki/Integer>.". ( > http://en.wikipedia.org/wiki/Kademlia) > > From https://github.com/telehash/telehash.org/blob/master/v2/dht.md : > > " > > Telehash adapts the Kademlia > <https://github.com/telehash/telehash.org/blob/master/v2/references.md> > Distributed Hash Table for its peer discovery. A "peer" in this document is > a single application instance, one unique hashname. > > Unlike the original Kademlia paper that was a key-value store, there is no > arbitrary data stored in the DHT. Peers query the DHT purely to locate > other peers, independent of IP or other transient network identifiers. > Telehash also departs from Kademlia by using SHA2 256-bit hashes (rather > than SHA1 160-bit hashes). > > Like any DHT, telehash peers cooperatively store all network information > while minimizing per-peer costs. Derived from Kademlia's hash-based > addressing and distance calculations, the average number of "nearby" peers > will grow logarithmically compared to the total global number of peers. > Peers then attempt to keep track of all of the closest peers, and > progressively fewer of the farther away peers. This pattern minimizes > "degrees of separation" between peers while also minimizing the number of > other peers each indidivual peer must keep track of. > > Like Kademlia, telehash measures distance with a bitwise XOR metric which > divides the address space into 256 possible partitions, also called > k-buckets. Every peer when compared will have a bucket value based on the > bit that differs, if the first bit is different the bucket would be 255, > and if the entire first byte is the same the bucket for that peer would be > 247." > > For (2) I would like to find out if using semantic information from linked > data would be useful instead of a bitwise XOR metric. > > INGA uses semantic information in four layers: > > "A peer responds to a query by providing an answer matching the > query or by forwarding the query to relevant remote peers. > The local peer determines the relevance of a remote peer > based on a personal semantic shortcut index. The index is > created and maintained in a lazy manner, i.e., by analyzing the > queries initiated by the local peer and by analyzing the queries > that are routed through the local peer. INGA creates shortcuts > on four layers: The content provider layer contains shortcuts to remote > peers which have successfully answered a query; > the recommender layer stores information about remote peers > who have issued a query; the bootstrapping layer > maintains shortcuts to well connected remote peers; and the > network layer connects to peers on an underlying default network." > > A Loser et al, Semantic Social Overlay Networks > > ( > http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.72.7668&rep=rep1&type=pdf > ) > > This leads to a question of using linked data as shortcuts to other peers. > How well would it fit into this model? > > A later part of the paper provides a start: > > "Conjunctive queries. Each query may include several pred- > icates, e.g. Select all resources that belong to the topic > semantic web and to the topic p2p. Using a common > topic hierarchy this query can be rewritten as > Find any resource having topics /computer/web/semanticweb and /com- > puter/distributed/TourismTechnology. An exact match ap- > proach routes a query only to a peer that matches > all predicates of the query using a simple exact match paradigm." > > > Considering (1) for swarm intelligence, I am reminded of Sebastian Koske's > Thesis, SwarmLinda is mentioned on page 34-35 that allows for self > organization into clusters. > > The next sections are summarized on page 40: > > "In the next sections, swarm-based approaches are introduced, which > provide support for typed templates (allowing typed triple retrieval), as > they cluster statements semantically and form thematically confined areas > within the Triple Space.: > > Could you combine both? It seems you would want to cluster similar things > while providing hints at what might be in other places. Then you could add > DHT to this for storage if you wanted? > > For the record, SwarmLinda uses tuplespace. ( > http://en.wikipedia.org/wiki/Tuple_space) > > > > >> >> On 7 April 2015 at 11:22, David Nicol <davidnicol@gmail.com> wrote: >> >>> use http://www.libtorrent.org/dht_store.html to store a verified >>> ledger. Start by adapting the BTC blockchain to dht_store access. >>> Devise a mechanism for trusting providers of cached ledger query >>> responses. >>> >>> >>> > What is missing is a decentralised data store that can serve as the >>> registry >>> > for these identities. The Credentials CG has proposed Telehash as this >>> > data-store. >>> > The challenge is that one then has to be explicit in defining the >>> discovery >>> > protocol as to which decentralised data store to use. >>> >>> > If someone proposed the namecoin block-chain as an alternative how do >>> we >>> > decide which to use? >>> > Who will the stewards of this decentralised data store? >>> > Is there an architecture for this data store that would be >>> rubber-stamped by >>> > the W3C as a cornerstone for dependent recommendations? >>> > (Here I am trying to think of an architecture that incentivises >>> participants >>> > to maintain the network assuming that financial incentives aren't >>> practical) >>> >> >> >
Received on Tuesday, 7 April 2015 23:53:06 UTC