Re: ANN: sameas.org from Dan Brickley on 2009-06-04 (public-lod@w3.org from June 2009)

From: Dan Brickley <danbri@danbri.org>
Date: Thu, 04 Jun 2009 11:26:46 +0200
To: Hugh Glaser <hg@ecs.soton.ac.uk>
CC: Richard Cyganiak <richard@cyganiak.de>, Semantic Web <semantic-web@w3.org>, Linked Data community <public-lod@w3.org>, Ian Millard <icm@ecs.soton.ac.uk>
Message-ID: <4A279356.6000300@danbri.org>
On 4/6/09 10:45, Hugh Glaser wrote:

>> Might it be possible to have the site offer
>> URIs, and some commitment they'll probably be around for a few years (or
>> somehow opensourced to collaborative maintainance if Southampton decide
>> not to maintain it later?).
> No :-)

Thanks for the clear answer :)

Am wondering now about data dumps or other ways of sharing / archiving 
the data...

> Thank you for answering this bit of Michael's post so well.
> Our strong view is that the solution to the problem of having all these URIs
> is not to generate another one.

I'm glad to hear that. Plucking figures from the air, 1 < n < 5 seems a 
manageable number of URIs to have in public play, per entity. At the 
moment I don't think many RDF toolkits deal particularly well for 
defragmenting descriptions via owl:sameAs, although I recall at least 
Virtuoso offers some support for it.

> 				And I would say that with services of this
> type around, there is no reason. Use an existing one, or construct a new one
> and make sure it is known about.
> If you want permanent new URIs, try using okkam, and we will hope to have
> more of okkam in our system soon.

Ah, that was my other question :) thanks

>> The reason I go on about this topic first is I could see people very
>> easily relying on such services, and doing so for many millions of
>> identifiers.
>>
>> Another thought: take a look at Social Graph API from Google; this might
>> help with people identification - http://code.google.com/apis/socialgraph/
>>
>> eg. for me,
>>
> http://sameas.org/html?uri=http%3A%2F%2Fdanbri.org%2Ffoaf.rdf%23danbri&x=9&y=1>
> 5
>> gives:
>>
>> 1.http://danbri.livejournal.com/data/foaf
>> 2.http://danbri.org/foaf#danbri
>> 3.http://danbri.org/foaf.rdf#danbri
>> 4.http://downlode.org/Code/RDF/FOAF/foaf.rdf#danbri
>> 5.http://downlode.org/Metadata/FOAF/foaf.rdf#danbri
>> 6.http://my.opera.com/danbri/
>> 7.http://my.opera.com/danbri/xml/foaf#me
>> 8.http://my.opera.com/danbri/xml/foaf#danbri-
>> 9.http://www4.wiwiss.fu-berlin.de/dblp/resource/person/336851
>>
>>
>> vs Google's
>> http://socialgraph.apis.google.com/lookup?q=http%3A%2F%2Fdanbri.org%2Ffoaf.rdf
>> %23danbri&fme=1&pretty=1&callback=
> Thanks.
> I guess the first question is, should I trust it?
> By the way, it seems that you are badly co-reffed in out system - sorry :-)
> (And I am not going to "fix" it, so we can see how the background systems
> run.)
> By the way, if you want the social/network graph, you can put the URI into
> http://www.rkbexplorer.com/network/
> Eg
> http://www.rkbexplorer.com/network/?uri=http://southampton.rkbexplorer.com/i
> d/person-62ca72227cd42255eb0d8c37383eccf0-2e1762effd1839702bc077c652d57901
>> Another thought - is the whole system necessarily based on pre-loaded
>> data, or could sameas.org make some explorations of the Web "while you
>> wait"? eg. do a few searches via Yahoo BOSS or Google JSON API and parse
>> the results for same-as's.
> I would avoid this.
> For it to be a service of the kind that John would use, I think it needs to
> provide a guaranteed fast response (at least in the sense of no other
> unexpected dependencies).

Ok. Agree that at least the default needs to be fast. Having an 
&work_smarter_but_slower=1 option is still intriguing to me though.

>> Re "bad results" it's worth looking at what Google SGAPI does. They
>> distinguish between one sided claims vs reciprocations. If my homepage
>> has rel=me pointing to my youtube profile, that's one piece of evidence
>> they have a common owner; if the profile has similar markup pointing
>> back, that's even more reassuring....
> Ah yes, now that is a big topic. Several PhDs on trust and provenance to be
> done here. What is the provenance of each of the pairwise assertions, how
> does that contribute to the bundle, how do multiple assertions from
> different sources contribute? In fact, what is the calculus of all this?

Quite. I've been talking about it sometimes as "claim graph analytics" 
lately, to emphasise firstly that each triple comes from some 
perspective, but also that the big picture will emerge from seeing which 
of these datasets support each other, and which contract each other, 
etc. Even if everyone agrees on common URIs for everything, this is a 
hairy problem. If the different layers don't even agree on common URIs, 
that's hairy * hairy... So we end up with two strands of difficulty: one 
is where we're looking at different datasources to figure out what 
consensus there is about the identifiers, then once we've done all we 
can on the identifier front, we also have to figure out what we make of 
the actual content from these different sources.

(Thinking about this stuff, I can't see much to support the view that we 
don't need inference on the SW. That said, much of the work on inference 
isn't yet addressing these kinds of problems.)

cheers,

Dan
Received on Thursday, 4 June 2009 09:27:30 UTC