- From: Antonin Delpeuch <antonin@delpeuch.eu>
- Date: Sat, 29 Jun 2019 10:01:44 +0200
- To: public-reconciliation@w3.org
- Message-ID: <7b82ad62-ab01-1be3-1873-75de6e1ed2f3@delpeuch.eu>
Hi Vladimir, On 6/26/19 10:56 AM, Vladimir Alexiev wrote: > In my experience, a user often needs to provide some "context" or > global parameter to reconciliation. > Eg local country (for place reconciliation), jurisdiction (for company > reconciliation), etc. > > The current protocol doesn't support that. > So eg OpenCorporates recon takes jurisdiction as part of the base > service URL. > "Global parameters" is a promising extension of the protocol. I am not sure what you mean here: the protocol does support adding properties to give some context to each reconciliation query. OpenRefine's interface does not make it easy to add such global properties: you have to create a new column containing the constant property value. But in my opinion this is more a limitation of OpenRefine rather than the protocol, I would say. For instance the jurisdiction that OpenCorporates accepts in the base service URL can also be specified as a property, which gives the same results. Or did I misunderstand your suggestion? Antonin > > On Tue, Jun 25, 2019 at 5:30 PM Thad Guidry <thadguidry@gmail.com > <mailto:thadguidry@gmail.com>> wrote: > > Vladimir, > > Do you think that "importance" and "what is popular" should be > controllable by the requesting user? > > What if my domain I'm working in is "Criminal activity" ... do you > agree that the importance or popularity scoring can now be skewed > and properly should be? > > *Thad Query* > Subject (query item): Rembrandt > Domain (proposed enhancement): Crime > > Results: > 90% https://www.wikidata.org/wiki/Q2246489 - The Storm on > the Sea of Galilee > 89% https://www.wikidata.org/wiki/Q661378 - The Anatomy > Lesson of Dr. Nicolaes Tulp > > 55% https://www.wikidata.org/wiki/Q5598 - Rembrandt (creator) > > > *Vladimir Query* > Subject (query item): Rembrandt > Domain (proposed enhancement): > > Results (with no Domain field specified): > 95% https://www.wikidata.org/wiki/Q5598 - Rembrandt (creator) > > 90% https://www.wikidata.org/wiki/Q990960 - Rembrandt (city > in Buena Vista County, Iowa, United States) > > Thad > https://www.linkedin.com/in/thadguidry/ > > > On Tue, Jun 25, 2019 at 8:45 AM Vladimir Alexiev > <vladimir.alexiev@ontotext.com > <mailto:vladimir.alexiev@ontotext.com>> wrote: > > Hi Antonin! Some feedback on the paper: > > - services can and should score even on the absence of fields, > based on the importance and popularity of entities. Eg a Paris > or Rembrandt should return the respective famous entities, not > one of the hundred other possible matches; inactive companies > should (and are) downgraded, etc > - if a dataset has multiple labels per entity (e.g VIAF has > over 100 for Cranach), it should take label nature into account > - VIAF has life years (and if I'm not mistaken gender) , which > are important to use. > - It even has Occupation and Nationality, which however are > part of the label and not normalized. (Our service extracts > them to separate fields) > - that H2020 project is not related to ERC > > On Thu, Jun 20, 2019, 18:14 Antonin Delpeuch > <antonin@delpeuch.eu <mailto:antonin@delpeuch.eu>> wrote: > > Hi all, > > I have just made public a short survey of the existing > reconciliation services and how their implementations > relate to the techniques used in record linkage (the > corresponding academic field): > > https://arxiv.org/abs/1906.08092 > > This is the outcome of a project done with OpenCorporates > to plan improvements to their own reconciliation service. > The survey is therefore focused on the perspective of the > service provider, rather than the end user, although the > goal is of course to make the service more useful to data > consumers. I hope this can help start the discussions (and > give some definitions, as Ricardo suggested). > > I have outlined a few suggestions, mostly around the issue > of scoring, which reflect my general feeling about this > API: at the moment service providers are a bit clueless > about how to score reconciliation candidates, and as a > consequence users cannot really rely on them in general. > > It would be great to also study what users actually do > with these services, to better understand their workflows. > I am not sure how to > approach that, given that OpenRefine workflows are > generally not published. One possibility would be to > analyze the logs of existing > reconciliation services (such as the one for Wikidata). > Let me know if you are interested in that sort of project. > > Àny feedback is welcome of course! > > Cheers, > > Antonin > > > > > -- > Vladimir Alexiev, PhD, PMP > Chief Data Architect > Sirma AI, trading as Ontotext: https://www.ontotext.com > <https://www.ontotext.com/>, LinkedIn > <https://www.linkedin.com/company-beta/208070>, Twitter > <https://twitter.com/ontotext>, Rate GraphDB > <http://www.capterra.com/database-management-software/reviews/157533/Graph%20DB/Ontotext/new> > Email: vladimir.alexiev@ontotext.com > <mailto:vladimir.alexiev@ontotext.com>, skype:valexiev1 > Mobile: +359 888 568 132, SMS: 359888568132@sms.mtel.net > <mailto:359888568132@sms.mtel.net> > Calendar: > https://www.google.com/calendar/embed?src=vladimir.alexiev@ontotext.com > Publications and CV: https://github.com/VladimirAlexiev/my
Received on Saturday, 29 June 2019 08:02:10 UTC