Re: backlinks / Re: bootstrapping decentralized sparql from Kingsley Idehen on 2009-05-17 (public-lod@w3.org from May 2009)

From: Kingsley Idehen <kidehen@openlinksw.com>
Date: Sun, 17 May 2009 14:31:59 -0400
To: Sandro Hawke <sandro@w3.org>
CC: Giovanni Tummarello <giovanni.tummarello@deri.org>, Peter Ansell <ansell.peter@gmail.com>, "semantic-web@w3.org" <semantic-web@w3.org>, Linked Data community <public-lod@w3.org>
Message-ID: <4A10581F.5080702@openlinksw.com>

Sandro Hawke wrote:
>> we all like to think "p2p", distributed, etc.
>> but the fact is that we love it too much, disregarding the basic
>> economic reasons that underly how the world (in fairness) works.
>>
>> But lets put a constraint.
>>
>> Lets imagine that we dont live forever and tha tthe time one should
>> work on a topic should be limited (e.g. 10years is a good span so i
>> began in 2002, 3 years left) dont you want to see some actual
>> advantange delivered to the end user within this timeframe? I do and
>> very strongly.
>>     
>
> Yes, I thought it a little ironic that you, of all people, were being
> cast as a centralist.  (I'm sure no insult was intended by anyone, of
> course.)  In practice, yes, we'd all love more decentralization, if we
> could have it for free.... but sometimes it's impractically expensive.
>
> Let me try to be more clear about my use case, though.  I am in no way
> complaining about Google or Sindice; they are great.  But by their
> nature (as I understand it, at least), they are not complete, and will
> not be able to do one particular (important) thing I want.
>
> I'd like to be able to run queries like this: tell me all showings of
> Star Trek in Cambridge, MA, on 2009-05-17.  (I'm not talking about the
> natural language part of that; I just want to be able to run the SPARQL
> equivalent of that natural language query.)  And I really do want the
> answer to be complete; if a showing is missing from my result set,
> that's because that showing is not being properly published.  (Right
> now, Google has a special mechanism, different from its normal search
> engine, to handle this particular example, because it's so compelling.
> I want something general, of course, that handles all queries -- not
> just movie times.)
>
> I think this is doable if by "properly published" we include the notion
> of backlinking.  I propose this rule: whenever you publish some RDF, you
> must notify all the backlink servers for all the URIs you use in your
> content. 
Sandro,

Amen re. backlinks, and they should even exist where the source isn't 
RDF :-) This is basically what I mean by the "owl:shameAs" pattern, 
since in due course it "shames" the original data owner into considering 
structured data granularity by making impression opportunity costs 
palpable. It also provides attribution.
>  If you don't do that, your content will not be fully
> searchable.  (In some cases, you will have to register a SPARQL end
> point, instead of numerous graphs.  This is part of what makes this
> feasible.)
>   

Yes.
> So, I'm picturing a market for backlink servers.  Everyone minting URIs
> for other people to use should pick some (probably two or three)
> backlink servers.  They don't have to run the service themselves.  They
> might or might not have to pay for the service, depending how the market
> evolves.
>   

So when I mention <http://lod.openlinksw.com/void/Dataset> which is part 
of any <http://lod.openlinksw.com> (meaning: anyone will be able to make 
personal and service specific variants in the cloud or in their own 
setup etc..), plus discoverable sparql endpoints that expose these stats 
(which also include backlinks to original data sources), I hope the 
vision I espouse is a little clearer re. congruence to yours :-)

> It might be that Sindice comes to dominate this market; they (you)
> probably have the best base technology to use for it at the moment.  But
> the point is that if there is a market, and a standard interface, then
> the service can probably be relied upon.
>
>   
The market is too big for 1000 googles. The network is scale-free, so no 
single entity can pull it off effectively, something will give. All we 
can do is build a federation that has user configurable traversal paths 
(what happens then a user interacts with the Web, established a beached 
via a representation, and then beams SPARQL from there, covertly or 
overtly).

Put bluntly, Google model is obsolete in this context, really :-)


-- 


Regards,

Kingsley Idehen	      Weblog: http://www.openlinksw.com/blog/~kidehen
President & CEO 
OpenLink Software     Web: http://www.openlinksw.com

Received on Sunday, 17 May 2009 18:32:39 UTC