Re: Linked data sets for evaluating interlinking?

> Thanks to all who have mentioned other datasets - more fodder for
> sameAs.org! :-)
>
> Christina,
> A separate message from me about datasets.
> As the maintainer of sameAs.org, I have quite a lot of such datasets in
> quite convenient forms (only the links) :-)
> So, for example, if you wanted Adrian's data, then I can give it to you.
> (I have queried the SPARQL endpoint to put stuff in sameAs.org. Both
> owl:sameAs and skos:exactMatch.)
> I have lots of bibliographic ones, especially national libraries, who have
> often sent me the data.
> (British, German, US, Japanese, Norwegian, French, Spanish, Hungarian … as
> best I recall.)
> I also have the VIAF data.
> This is all aggregated in http://sameas.org/store/kelle/ and other stuff
> is kept in some sameAs stores - see http://sameas.org/store/
> Freebase is an interesting one (that is Google Graph, and they send me
> their data.)
> LATC has been mentioned, and I have a store with that data.
>
> Also, Rob Warren is spot on!
> owl:differentFrom is your friend.
> It can be used to tell you the resources that might have been considered
> the same, but some more work has been done to find out that the system was
> wrong.
> In some sense it gives you upper and lower bounds on precision/recall.
>
> It so happens (!) that I also run http://differentfrom.org where I gather
> such data.
> Again, Freebase have given me their regression test for asserting
> sameness, and I have a store with that in.
> And LATC published their similar data, and I have put it in a store.
>
> I hope that helps - ask me for data if you need it, although I hope you
> can be as specific as possible.
> (If I don't have it, I may well decide to harvest it to put in
> sameas.org.)
>
> Best
> Hugh
>
Thanks a lot for all the references that I received.


Best,
Cristina
> On 26 Aug 2013, at 12:04, Adrian Stevenson
> <adrian.stevenson@manchester.ac.uk>
>  wrote:
>
>> Hi All
>>
>> As part of the LOCAH and Linking Lives projects, the latter in
>> particular, we've being doing a lot of this auto and manual linking
>> work, mainly to VIAF and DBPedia, with some links to things like LCSH
>> and Geonames. We've been doing a lot of work just recently in fact, and
>> we've published a blog post that's picked up quite a bit of interest on
>> this - http://archiveshub.ac.uk/blog/2013/08/hub-viaf-namematching/. We
>> haven't published our latest run of data yet, but we hope to finish this
>> soon. It'll probably still be about a month or so as a few of us are on
>> holiday soon.
>>
>> We do have quite a few links done semi-automatically in our existing
>> data set accessible via http://data.archiveshub.ac.uk but as I say we
>> are updating this, I'd suggest not taking the URIs and data available
>> there as the final word.
>>
>> A good example is
>> http://data.archiveshub.ac.uk/page/person/nra/webbmarthabeatrice1858-1943socialreformer
>>
>> Project URIs:
>> http://archiveshub.ac.uk/locah/
>> http://archiveshub.ac.uk/linkinglives/
>>
>> Adrian
>> _____________________________
>> Adrian Stevenson
>> Senior Technical Innovations Coordinator
>> Mimas, The University of Manchester
>> Devonshire House, Oxford Road
>> Manchester M13 9QH
>>
>> Email: adrian.stevenson@manchester.ac.uk
>> Tel: +44 (0) 161 275 6065
>> http://www.mimas.ac.uk
>> http://www.twitter.com/adrianstevenson
>> http://uk.linkedin.com/in/adrianstevenson/
>>
>> On 22 Aug 2013, at 16:06, Cristina Sarasua wrote:
>>
>>> Hi,
>>>
>>> I am looking for pairs of linked data sets that can be used as gold
>>> standard for evaluations.  I would need pairs of data sets which have
>>> been manually linked, or data sets which have been (semi-)automatically
>>> linked with interlinking tools, and afterwards reviewed (to include the
>>> links which are not identified by tools). I have looked into the
>>> DataHub catalogue and queried VoiD descriptions, but unfortunately the
>>> information about how the interlinking process was carried out is often
>>> missing.
>>>
>>> Apart from the data sets which have been used in the OAEI-instance
>>> matching track, could anyone recommend (based on past experience) good
>>> data sets for evaluating data interlinking processes?
>>>
>>> Thanks in advance.
>>>
>>> Kind regards,
>>>
>>> Cristina
>>> --
>>> Cristina Sarasua
>>>
>>> Institute for Web Science and Technologies (WeST)
>>>
>>> Universität Koblenz-Landau
>>> Universitätsstraße 1
>>> 56070 Koblenz
>>> Germany
>>>
>>> e:
>>> csarasua@uni-koblenz.de
>>>
>>> p: +49 261 287 2772
>>> f: +49 261 287 100 2772
>>> w:
>>> http://west.uni-koblenz.de
>>
>>
>
>
>

Received on Tuesday, 27 August 2013 17:40:45 UTC