- From: Alan Buxton <alan.buxton@opencorporates.com>
- Date: Wed, 26 Jun 2019 14:57:42 +0200
- To: Vladimir Alexiev <vladimir.alexiev@ontotext.com>
- Cc: Thad Guidry <thadguidry@gmail.com>, public-reconciliation@w3.org
- Message-ID: <CAEBk1NtH0Ec4Xr8Jac38r2K7kMudapaVoWid=miPU5kT4sogeA@mail.gmail.com>
This is a really interesting discussion. I like the idea of the user providing some kind of input about how they want to score things. Different domains will have different views on importance/relevance. On Wed, 26 Jun 2019, 10:57 Vladimir Alexiev, <vladimir.alexiev@ontotext.com> wrote: > In my experience, a user often needs to provide some "context" or global > parameter to reconciliation. > Eg local country (for place reconciliation), jurisdiction (for company > reconciliation), etc. > > The current protocol doesn't support that. > So eg OpenCorporates recon takes jurisdiction as part of the base service > URL. > "Global parameters" is a promising extension of the protocol. > > On Tue, Jun 25, 2019 at 5:30 PM Thad Guidry <thadguidry@gmail.com> wrote: > >> Vladimir, >> >> Do you think that "importance" and "what is popular" should be >> controllable by the requesting user? >> >> What if my domain I'm working in is "Criminal activity" ... do you agree >> that the importance or popularity scoring can now be skewed and properly >> should be? >> >> *Thad Query* >> Subject (query item): Rembrandt >> Domain (proposed enhancement): Crime >> >> Results: >> 90% https://www.wikidata.org/wiki/Q2246489 - The Storm on the Sea of >> Galilee >> 89% https://www.wikidata.org/wiki/Q661378 - The Anatomy Lesson of Dr. >> Nicolaes Tulp >> >> 55% https://www.wikidata.org/wiki/Q5598 - Rembrandt (creator) >> >> >> *Vladimir Query* >> Subject (query item): Rembrandt >> Domain (proposed enhancement): >> >> Results (with no Domain field specified): >> 95% https://www.wikidata.org/wiki/Q5598 - Rembrandt (creator) >> >> 90% https://www.wikidata.org/wiki/Q990960 - Rembrandt (city in Buena >> Vista County, Iowa, United States) >> >> Thad >> https://www.linkedin.com/in/thadguidry/ >> >> >> On Tue, Jun 25, 2019 at 8:45 AM Vladimir Alexiev < >> vladimir.alexiev@ontotext.com> wrote: >> >>> Hi Antonin! Some feedback on the paper: >>> >>> - services can and should score even on the absence of fields, based on >>> the importance and popularity of entities. Eg a Paris or Rembrandt should >>> return the respective famous entities, not one of the hundred other >>> possible matches; inactive companies should (and are) downgraded, etc >>> - if a dataset has multiple labels per entity (e.g VIAF has over 100 for >>> Cranach), it should take label nature into account >>> - VIAF has life years (and if I'm not mistaken gender) , which are >>> important to use. >>> - It even has Occupation and Nationality, which however are part of the >>> label and not normalized. (Our service extracts them to separate fields) >>> - that H2020 project is not related to ERC >>> >>> On Thu, Jun 20, 2019, 18:14 Antonin Delpeuch <antonin@delpeuch.eu> >>> wrote: >>> >>>> Hi all, >>>> >>>> I have just made public a short survey of the existing reconciliation >>>> services and how their implementations relate to the techniques used in >>>> record linkage (the corresponding academic field): >>>> >>>> https://arxiv.org/abs/1906.08092 >>>> >>>> This is the outcome of a project done with OpenCorporates to plan >>>> improvements to their own reconciliation service. The survey is therefore >>>> focused on the perspective of the service provider, rather than the end >>>> user, although the goal is of course to make the service more useful to >>>> data consumers. I hope this can help start the discussions (and give some >>>> definitions, as Ricardo suggested). >>>> >>>> I have outlined a few suggestions, mostly around the issue of scoring, >>>> which reflect my general feeling about this API: at the moment service >>>> providers are a bit clueless about how to score reconciliation candidates, >>>> and as a consequence users cannot really rely on them in general. >>>> >>>> It would be great to also study what users actually do with these >>>> services, to better understand their workflows. I am not sure how to >>>> approach that, given that OpenRefine workflows are generally not >>>> published. One possibility would be to analyze the logs of existing >>>> reconciliation services (such as the one for Wikidata). Let me know if >>>> you are interested in that sort of project. >>>> >>>> Àny feedback is welcome of course! >>>> >>>> Cheers, >>>> >>>> Antonin >>>> >>>> >>>> > > -- > Vladimir Alexiev, PhD, PMP > Chief Data Architect > Sirma AI, trading as Ontotext: https://www.ontotext.com, LinkedIn > <https://www.linkedin.com/company-beta/208070>, Twitter > <https://twitter.com/ontotext>, Rate GraphDB > <http://www.capterra.com/database-management-software/reviews/157533/Graph%20DB/Ontotext/new> > Email: vladimir.alexiev@ontotext.com, skype:valexiev1 > Mobile: +359 888 568 132, SMS: 359888568132@sms.mtel.net > Calendar: > https://www.google.com/calendar/embed?src=vladimir.alexiev@ontotext.com > Publications and CV: https://github.com/VladimirAlexiev/my >
Received on Wednesday, 26 June 2019 12:59:04 UTC