Survey of OpenRefine reconciliation services

Hi all,

I have just made public a short survey of the existing reconciliation services and how their implementations relate to the techniques used in record linkage (the corresponding academic field):

https://arxiv.org/abs/1906.08092

This is the outcome of a project done with OpenCorporates to plan improvements to their own reconciliation service. The survey is therefore focused on the perspective of the service provider, rather than the end user, although the goal is of course to make the service more useful to data consumers. I hope this can help start the discussions (and give some definitions, as Ricardo suggested).

I have outlined a few suggestions, mostly around the issue of scoring, which reflect my general feeling about this API: at the moment service providers are a bit clueless about how to score reconciliation candidates, and as a consequence users cannot really rely on them in general.

It would be great to also study what users actually do with these services, to better understand their workflows. I am not sure how to
approach that, given that OpenRefine workflows are generally not published. One possibility would be to analyze the logs of existing
reconciliation services (such as the one for Wikidata). Let me know if you are interested in that sort of project.

Àny feedback is welcome of course!

Cheers,

Antonin

Received on Thursday, 20 June 2019 16:13:59 UTC