Re: Linked data sets for evaluating interlinking?

Thanks to all who have mentioned other datasets - more fodder for sameAs.org! :-)

Christina,
A separate message from me about datasets.
As the maintainer of sameAs.org, I have quite a lot of such datasets in quite convenient forms (only the links) :-)
So, for example, if you wanted Adrian's data, then I can give it to you.
(I have queried the SPARQL endpoint to put stuff in sameAs.org. Both owl:sameAs and skos:exactMatch.)
I have lots of bibliographic ones, especially national libraries, who have often sent me the data.
(British, German, US, Japanese, Norwegian, French, Spanish, Hungarian … as best I recall.)
I also have the VIAF data.
This is all aggregated in http://sameas.org/store/kelle/ and other stuff is kept in some sameAs stores - see http://sameas.org/store/
Freebase is an interesting one (that is Google Graph, and they send me their data.)
LATC has been mentioned, and I have a store with that data.

Also, Rob Warren is spot on!
owl:differentFrom is your friend.
It can be used to tell you the resources that might have been considered the same, but some more work has been done to find out that the system was wrong.
In some sense it gives you upper and lower bounds on precision/recall.

It so happens (!) that I also run http://differentfrom.org where I gather such data.
Again, Freebase have given me their regression test for asserting sameness, and I have a store with that in.
And LATC published their similar data, and I have put it in a store.

I hope that helps - ask me for data if you need it, although I hope you can be as specific as possible.
(If I don't have it, I may well decide to harvest it to put in sameas.org.)

Best
Hugh

On 26 Aug 2013, at 12:04, Adrian Stevenson <adrian.stevenson@manchester.ac.uk>
 wrote:

> Hi All
> 
> As part of the LOCAH and Linking Lives projects, the latter in particular, we've being doing a lot of this auto and manual linking work, mainly to VIAF and DBPedia, with some links to things like LCSH and Geonames. We've been doing a lot of work just recently in fact, and we've published a blog post that's picked up quite a bit of interest on this - http://archiveshub.ac.uk/blog/2013/08/hub-viaf-namematching/. We haven't published our latest run of data yet, but we hope to finish this soon. It'll probably still be about a month or so as a few of us are on holiday soon.
> 
> We do have quite a few links done semi-automatically in our existing data set accessible via http://data.archiveshub.ac.uk but as I say we are updating this, I'd suggest not taking the URIs and data available there as the final word.
> 
> A good example is http://data.archiveshub.ac.uk/page/person/nra/webbmarthabeatrice1858-1943socialreformer
> 
> Project URIs:
> http://archiveshub.ac.uk/locah/
> http://archiveshub.ac.uk/linkinglives/
> 
> Adrian
> _____________________________
> Adrian Stevenson
> Senior Technical Innovations Coordinator
> Mimas, The University of Manchester
> Devonshire House, Oxford Road
> Manchester M13 9QH
> 
> Email: adrian.stevenson@manchester.ac.uk
> Tel: +44 (0) 161 275 6065
> http://www.mimas.ac.uk
> http://www.twitter.com/adrianstevenson
> http://uk.linkedin.com/in/adrianstevenson/
> 
> On 22 Aug 2013, at 16:06, Cristina Sarasua wrote:
> 
>> Hi, 
>> 
>> I am looking for pairs of linked data sets that can be used as gold standard for evaluations.  I would need pairs of data sets which have been manually linked, or data sets which have been (semi-)automatically linked with interlinking tools, and afterwards reviewed (to include the links which are not identified by tools). I have looked into the DataHub catalogue and queried VoiD descriptions, but unfortunately the information about how the interlinking process was carried out is often missing.
>> 
>> Apart from the data sets which have been used in the OAEI-instance matching track, could anyone recommend (based on past experience) good data sets for evaluating data interlinking processes?
>> 
>> Thanks in advance.
>> 
>> Kind regards, 
>> 
>> Cristina
>> -- 
>> Cristina Sarasua
>> 
>> Institute for Web Science and Technologies (WeST)
>> 
>> Universität Koblenz-Landau
>> Universitätsstraße 1
>> 56070 Koblenz
>> Germany
>> 
>> e: 
>> csarasua@uni-koblenz.de
>> 
>> p: +49 261 287 2772
>> f: +49 261 287 100 2772
>> w: 
>> http://west.uni-koblenz.de 
> 
> 

Received on Monday, 26 August 2013 20:21:49 UTC