W3C home > Mailing lists > Public > public-lod@w3.org > May 2009

backlinks / Re: bootstrapping decentralized sparql

From: Sandro Hawke <sandro@w3.org>
Date: Sun, 17 May 2009 11:21:37 -0400
To: Giovanni Tummarello <giovanni.tummarello@deri.org>
cc: Peter Ansell <ansell.peter@gmail.com>, "semantic-web@w3.org" <semantic-web@w3.org>, Linked Data community <public-lod@w3.org>
Message-ID: <15149.1242573697@ubehebe>

> we all like to think "p2p", distributed, etc.
> but the fact is that we love it too much, disregarding the basic
> economic reasons that underly how the world (in fairness) works.
> 
> But lets put a constraint.
> 
> Lets imagine that we dont live forever and tha tthe time one should
> work on a topic should be limited (e.g. 10years is a good span so i
> began in 2002, 3 years left) dont you want to see some actual
> advantange delivered to the end user within this timeframe? I do and
> very strongly.

Yes, I thought it a little ironic that you, of all people, were being
cast as a centralist.  (I'm sure no insult was intended by anyone, of
course.)  In practice, yes, we'd all love more decentralization, if we
could have it for free.... but sometimes it's impractically expensive.

Let me try to be more clear about my use case, though.  I am in no way
complaining about Google or Sindice; they are great.  But by their
nature (as I understand it, at least), they are not complete, and will
not be able to do one particular (important) thing I want.

I'd like to be able to run queries like this: tell me all showings of
Star Trek in Cambridge, MA, on 2009-05-17.  (I'm not talking about the
natural language part of that; I just want to be able to run the SPARQL
equivalent of that natural language query.)  And I really do want the
answer to be complete; if a showing is missing from my result set,
that's because that showing is not being properly published.  (Right
now, Google has a special mechanism, different from its normal search
engine, to handle this particular example, because it's so compelling.
I want something general, of course, that handles all queries -- not
just movie times.)

I think this is doable if by "properly published" we include the notion
of backlinking.  I propose this rule: whenever you publish some RDF, you
must notify all the backlink servers for all the URIs you use in your
content.  If you don't do that, your content will not be fully
searchable.  (In some cases, you will have to register a SPARQL end
point, instead of numerous graphs.  This is part of what makes this
feasible.)

So, I'm picturing a market for backlink servers.  Everyone minting URIs
for other people to use should pick some (probably two or three)
backlink servers.  They don't have to run the service themselves.  They
might or might not have to pay for the service, depending how the market
evolves.

It might be that Sindice comes to dominate this market; they (you)
probably have the best base technology to use for it at the moment.  But
the point is that if there is a market, and a standard interface, then
the service can probably be relied upon.

      -- Sandro
Received on Sunday, 17 May 2009 15:21:59 UTC

This archive was generated by hypermail 2.3.1 : Sunday, 31 March 2013 14:24:20 UTC