- From: Vladimir Alexiev <vladimir.alexiev@ontotext.com>
- Date: Tue, 25 Jun 2019 16:44:52 +0300
- To: Antonin Delpeuch <antonin@delpeuch.eu>
- Cc: public-reconciliation@w3.org
- Message-ID: <CAMv+wg71Kz1DA4aqomMfswj3ABT8SZahFyqh3SPKk_e-VBFp_w@mail.gmail.com>
Hi Antonin! Some feedback on the paper: - services can and should score even on the absence of fields, based on the importance and popularity of entities. Eg a Paris or Rembrandt should return the respective famous entities, not one of the hundred other possible matches; inactive companies should (and are) downgraded, etc - if a dataset has multiple labels per entity (e.g VIAF has over 100 for Cranach), it should take label nature into account - VIAF has life years (and if I'm not mistaken gender) , which are important to use. - It even has Occupation and Nationality, which however are part of the label and not normalized. (Our service extracts them to separate fields) - that H2020 project is not related to ERC On Thu, Jun 20, 2019, 18:14 Antonin Delpeuch <antonin@delpeuch.eu> wrote: > Hi all, > > I have just made public a short survey of the existing reconciliation > services and how their implementations relate to the techniques used in > record linkage (the corresponding academic field): > > https://arxiv.org/abs/1906.08092 > > This is the outcome of a project done with OpenCorporates to plan > improvements to their own reconciliation service. The survey is therefore > focused on the perspective of the service provider, rather than the end > user, although the goal is of course to make the service more useful to > data consumers. I hope this can help start the discussions (and give some > definitions, as Ricardo suggested). > > I have outlined a few suggestions, mostly around the issue of scoring, > which reflect my general feeling about this API: at the moment service > providers are a bit clueless about how to score reconciliation candidates, > and as a consequence users cannot really rely on them in general. > > It would be great to also study what users actually do with these > services, to better understand their workflows. I am not sure how to > approach that, given that OpenRefine workflows are generally not > published. One possibility would be to analyze the logs of existing > reconciliation services (such as the one for Wikidata). Let me know if you > are interested in that sort of project. > > Àny feedback is welcome of course! > > Cheers, > > Antonin > > >
Received on Tuesday, 25 June 2019 13:45:26 UTC