Re: Survey of OpenRefine reconciliation services

Hi Antonin! Some feedback on the paper:

- services can and should score even on the absence of fields, based on the
importance and popularity of entities. Eg a Paris or Rembrandt should
return the respective famous entities, not one of the hundred other
possible matches; inactive companies should (and are) downgraded, etc
- if a dataset has multiple labels per entity (e.g VIAF has over 100 for
Cranach), it should take label nature into account
- VIAF has life years (and if I'm not mistaken gender) , which are
important to use.
- It even has Occupation and Nationality, which however are part of the
label and not normalized. (Our service extracts them to separate fields)
- that H2020 project is not related to ERC

On Thu, Jun 20, 2019, 18:14 Antonin Delpeuch <antonin@delpeuch.eu> wrote:

> Hi all,
>
> I have just made public a short survey of the existing reconciliation
> services and how their implementations relate to the techniques used in
> record linkage (the corresponding academic field):
>
> https://arxiv.org/abs/1906.08092
>
> This is the outcome of a project done with OpenCorporates to plan
> improvements to their own reconciliation service. The survey is therefore
> focused on the perspective of the service provider, rather than the end
> user, although the goal is of course to make the service more useful to
> data consumers. I hope this can help start the discussions (and give some
> definitions, as Ricardo suggested).
>
> I have outlined a few suggestions, mostly around the issue of scoring,
> which reflect my general feeling about this API: at the moment service
> providers are a bit clueless about how to score reconciliation candidates,
> and as a consequence users cannot really rely on them in general.
>
> It would be great to also study what users actually do with these
> services, to better understand their workflows. I am not sure how to
> approach that, given that OpenRefine workflows are generally not
> published. One possibility would be to analyze the logs of existing
> reconciliation services (such as the one for Wikidata). Let me know if you
> are interested in that sort of project.
>
> Àny feedback is welcome of course!
>
> Cheers,
>
> Antonin
>
>
>

Received on Tuesday, 25 June 2019 13:45:26 UTC