- From: Dan Brickley <danbri@w3.org>
- Date: Sat, 3 Feb 2001 09:50:48 -0500 (EST)
- To: <tech@freenetproject.org>
- cc: <www-rdf-interest@w3.org>
(changed the subject line, previous one was mostly mailer info accretions. RDF folk, context is discssion about searching in Freenet, see http://www.freenetproject.org/ ) (I'm cc:'ing the RDF interest group list since I've been meaning to sketch a Freenet / RDF use case for some time. Please trim cc:'s in followups as appropriate) On Sat, 3 Feb 2001, Tavin Cole wrote: > > > Three words: cvs checkout Freenet > > > > So rather than begin in a vacuum, I'd like to know where other folks are > > up to with Freenet search. Specifically I'd like to back up some of my RDF > > rhetoric[1] with some concrete designs, but don't want to re-invent any > > wheels. The FAQ doesn't point to any detailed proposals. 'cvs checkout > > Freenet' isn't itself enough, if we want to do this right. > > well, there was an implied 'cvs commit' there.. ;^) :) > > Also, is it considered 'polite' to spider the current Freenet web? I > > presume spiders/robots will be common eventually, but I recall some > > comment on one of the Freenet index pages that suggested such activities > > might be frowned upon at this stage of Freenet's development... > > I believe the problem was that spidering robots were crippling nodes through > sheer volume of requests, each different one asking for the same damn stuff.. > or maybe it was something else.. ?? > > > Anyway, what would be really cool IMHO would be a distributed spidering > system that spread the request load across lots of nodes and involved > the cooperation of spidering daemons across many computers, each spidering > a different section of the keyspace. Yes, that's kind of the model that's been floating around in Web indexing circles for 5-6 years without really having yet taken off. The (now largely deceased) Harvest system being one of the better know efforts. I'm not really sure why it didn't succeed since the decentralist, collaborative indexing model seems to have a lot going for it architecturally. I fear it was lack of obvious business models that led to these systems getting switched off. There was a handy overview of these efforts in Scientific American a few years ago, online c/o http://www.sciam.com/0397issue/0397lynch.html One surviving Harvest-based system is Dave Beckett's ACDC service, see http://acdc.search.ac.uk/ -- perhaps Dave might have some views on why Harvest's gone the way of Gopher... Also the DESIRE project did a bunch of work on a European web index composed of national indices (eg. see ancient report at http://www.lub.lu.se/desire/radar/reports/D3.12/) and others have built subject-specific search engines. Yet the dominant model in Web search still seems to be "build a huge database of everything on the Web ever". Even aside P2P, things seem to be drifting toward more subject or regional search though, eg. the cutesy screenscrape-based search in Mozilla / Netscape 6. (http://sherlock.mozdev.org/) But now I'm wondering whether the kinds of ways Web indexes have carved up the indexing job (typically by region or by topic) make sense in Freenet. Certainly the regional slant seems odd. By-topic appeals, though we're left with the bootstrapping problem of knowing what the pages/objects are about before figuring out whether to index them. - - - So rather than speculatively ramble on, I've a concrete metadata question / proposal. (Freenet as a queryable distributed hypertext database?) Suppose I wanted to find resources in Freenet that stand in a certain named relationship (eg. critique, version, sub-section, review, endorsement...) to some other resource in Freenet. And I know the key for the the first resource. Assume I use URI names for these relations (as RDF does), where the URI for the relation might be a freenet URI (resolving to a definition of that relation) or some other URI, eg. http://xmlns.com/example/#isCriticalReviewOf. So, freenet:key1 is already in Freenet. The over time other people write critical reviews of the first resource, and write those into Freenet too. so we have: freenet:key2 --- ex:isCriticalReviewOf --> freenet:key1 freenet:key3 --- ex:isCriticalReviewOf --> freenet:key1 ...as the two factoids that we might want to use when doing certain kinds of search. I'm trying to think of a way one might use the existing Freenet infrastructure to ask Freenet questions couched in these terms. I know how to use non-Freenet RDF tools for this sort of thing, but that's not the question at hand. Say we had our hands on key2 and key3. Then it'd be easy to find out the URI for key1 since inline or associated metadata for key2 and key3 could tell us this. But say someone was looking at key1, and wanted to find critical reviews of it (or other things related in various name-able ways). How to ask Freenet for resources that match that search? Half-baked proposal: (1) take the relation identifier, and the resouce we're concerned with, and put them through some function to generate a unique number representing the combination of some identified resource with some identified relation. (2) (the vague bit) Use that computed value as a common, decentralised strategy for pointing back into freenet. Anybody who has those two URIs to hand (eg. a resource and what they want to know about it) would also have enough to ask that question of Freenet. BTW I'm a little hazy on the current sub-spaces mechanism; please excuse the handwaving. In the simplest case, one might use this as a starting point for an uncontrolled and ever growing list of Freenet resources which contain futher metadata about the (alleged!) relation. The first person to use this would get computed-value_1, the second would find that key already taken and use computed-value_2, and so on. Usage walkthrough: party A creates a resource ("Dan's Lemonade Cancer Cures") and writes it into the Freenet system under freenet:key-1. part B creates a devestating critique of that document, in it goes under freenet:key-2. party C decides (after consulting medical opinion) that B is right, and takes the time to express this judgement in (signed) XML/RDF, and writes it into Freenet under the first _1, _2, _3 suffixed key based on a computer value derrived from 'http://xmlns.com/example/critiquedBy' and 'freeenet:key-1'. party D has some health concerns, turns to the (by now Freenet-rewired ;) Web, and finds the original document (freenet:key-1). Being skeptical, party D presses the "Oh yeah" button on his/her browser, which computes a list of values based one or more relevant Web-identifiable relations (eg: critique, disclaimer, errata) and uses those to retrieve relationship metadata from Freenet; by interpreting this, the browser can provide some context to better help D figure out whether to trust A's document. C's resource would typically contain Dublin Core descriptions of the main two documents, an assertion that the stand in one or more relationships, and whatever else D wanted to say. RDF's good like that. But the main thing is the relationship and the title/description/etc stuff. Questions: I sketch this totally in terms of browse-time querying. Is it reasonable to expect performance from Freenet that would make such a second (and third... and...) lookup feasible? If not, how best to cache/precompute/harvest the meta-information? (the question we began with!) Secondly, are there any better mechanisms than appending _1 _2 _3 to get/put items from an ever-growing bag of entries in Freenet, sticking to the constraint that the key needs to be derrivable from a pair of URIs? Hoping I manage to make sense on both Freenet and RDF Interest mailing lists simultaneously, Dan -- http://purl.org/net/danbri/
Received on Saturday, 3 February 2001 09:50:49 UTC