Freenet, distributed search and simple RDF queries

(changed the subject line, previous one was mostly mailer info
accretions. RDF folk, context is discssion about searching in Freenet,
see http://www.freenetproject.org/ )

(I'm cc:'ing the RDF interest group list since I've been meaning to sketch
a Freenet / RDF use case for some time. Please trim cc:'s in followups as
appropriate)

On Sat, 3 Feb 2001, Tavin Cole wrote:

> > > Three words:  cvs checkout Freenet
> >
> > So rather than begin in a vacuum, I'd like to know where other folks are
> > up to with Freenet search. Specifically I'd like to back up some of my RDF
> > rhetoric[1] with some concrete designs, but don't want to re-invent any
> > wheels. The FAQ doesn't point to any detailed proposals. 'cvs checkout
> > Freenet' isn't itself enough, if we want to do this right.
>
> well, there was an implied 'cvs commit' there..  ;^)

:)

> > Also, is it considered 'polite' to spider the current Freenet web? I
> > presume spiders/robots will be common eventually, but I recall some
> > comment on one of the Freenet index pages that suggested such activities
> > might be frowned upon at this stage of Freenet's development...
>
> I believe the problem was that spidering robots were crippling nodes through
> sheer volume of requests, each different one asking for the same damn stuff..
> or maybe it was something else.. ??
>
>
> Anyway, what would be really cool IMHO would be a distributed spidering
> system that spread the request load across lots of nodes and involved
> the cooperation of spidering daemons across many computers, each spidering
> a different section of the keyspace.


Yes, that's kind of the model that's been floating around in Web indexing
circles for 5-6 years without really having yet taken off. The (now
largely deceased) Harvest system being one of the better know efforts. I'm
not really sure why it didn't succeed since the decentralist,
collaborative indexing model seems to have a lot going for it
architecturally. I fear it was lack of obvious business models that led to
these systems getting switched off. There was a handy overview of these
efforts in Scientific American a few years ago, online c/o
http://www.sciam.com/0397issue/0397lynch.html
One surviving Harvest-based system is Dave Beckett's ACDC service, see
http://acdc.search.ac.uk/ -- perhaps Dave might have some views on why
Harvest's gone the way of Gopher...


Also the DESIRE project did a bunch of work on a European web
index composed of national indices (eg. see ancient report at
http://www.lub.lu.se/desire/radar/reports/D3.12/) and others have
built subject-specific search engines. Yet the dominant model in Web
search still seems to be "build a huge database of everything on
the Web ever". Even aside P2P, things seem to be drifting toward more
subject or regional search though, eg. the cutesy screenscrape-based
search in Mozilla / Netscape 6. (http://sherlock.mozdev.org/)

But now I'm wondering whether the kinds of ways Web indexes have carved up
the indexing job (typically by region or by topic) make sense in Freenet.
Certainly the regional slant seems odd. By-topic appeals, though we're
left with the bootstrapping problem of knowing what the pages/objects are
about before figuring out whether to index them.

  - - -

So rather than speculatively ramble on, I've a concrete metadata question
/ proposal.


(Freenet as a queryable distributed hypertext database?)

Suppose I wanted to find resources in Freenet that stand in a certain
named relationship (eg. critique, version, sub-section, review,
endorsement...) to some other resource in Freenet. And I know the key for
the the first resource. Assume I use URI names for these relations (as RDF
does), where the URI for the relation might be a freenet URI (resolving to
a definition of that relation) or some other URI, eg.
http://xmlns.com/example/#isCriticalReviewOf.

So, freenet:key1 is already in Freenet. The over time other people write
critical reviews of the first resource, and write those into Freenet too.

so we have:

freenet:key2 --- ex:isCriticalReviewOf --> freenet:key1
freenet:key3 --- ex:isCriticalReviewOf --> freenet:key1

...as the two factoids that we might want to use when doing certain kinds
of search.

I'm trying to think of a way one might use the existing Freenet
infrastructure to ask Freenet questions couched in these terms. I know how
to use non-Freenet RDF tools for this sort of thing, but that's not the
question at hand.

Say we had our hands on key2 and key3. Then it'd be easy to find out the
URI for key1 since inline or associated metadata for key2 and key3 could
tell us this. But say someone was looking at key1, and wanted to find
critical reviews of it (or other things related in various name-able
ways). How to ask Freenet for resources that match that search?

Half-baked proposal:

(1)
take the relation identifier, and the resouce we're concerned with, and
put them through some function to generate a unique number representing
the combination of some identified resource with some identified relation.

(2) (the vague bit)
Use that computed value as a common, decentralised strategy for pointing
back into freenet. Anybody who has those two URIs to hand (eg. a resource
and what they want to know about it) would also have enough to ask that
question of Freenet. BTW I'm a little hazy on the current sub-spaces
mechanism;  please excuse the handwaving. In the simplest case, one might
use this as a starting point for an uncontrolled and ever growing list of
Freenet resources which contain futher metadata about the (alleged!)
relation. The first person to use this would get computed-value_1, the
second would find that key already taken and use computed-value_2, and so
on.

Usage walkthrough:

party A creates a resource ("Dan's Lemonade Cancer Cures") and writes it
into the Freenet system under freenet:key-1.

part B creates a devestating critique of that document, in it goes under
freenet:key-2.

party C decides (after consulting medical opinion) that B is right, and
takes the time to express this judgement in (signed) XML/RDF, and
writes it into Freenet under the first _1, _2, _3 suffixed key based on a
computer value derrived from 'http://xmlns.com/example/critiquedBy' and
'freeenet:key-1'.

party D has some health concerns, turns to the (by now Freenet-rewired ;)
Web, and finds the original document (freenet:key-1). Being skeptical,
party D presses the "Oh yeah" button on his/her browser, which computes a
list of values based one or more relevant Web-identifiable relations
(eg: critique, disclaimer, errata) and uses those to retrieve relationship
metadata from Freenet; by interpreting this, the browser can provide some
context to better help D figure out whether to trust A's document. C's
resource would typically contain Dublin Core descriptions of the main two
documents, an assertion that the stand in one or more relationships, and
whatever else D wanted to say. RDF's good like that. But the main thing is
the relationship and the title/description/etc stuff.


Questions:

I sketch this totally in terms of browse-time querying. Is it reasonable
to expect performance from Freenet that would make such a second (and
third... and...) lookup feasible? If not, how best to
cache/precompute/harvest the meta-information? (the question we began
with!)

Secondly, are there any better mechanisms than appending _1 _2 _3 to
get/put items from an ever-growing bag of entries in Freenet, sticking to
the constraint that the key needs to be derrivable from a pair of URIs?

Hoping I manage to make sense on both Freenet and RDF Interest mailing
lists simultaneously,

Dan


--
http://purl.org/net/danbri/

Received on Saturday, 3 February 2001 09:50:49 UTC