Re: Reconciling without names from Vladimir Alexiev on 2020-10-18 (public-reconciliation@w3.org from October 2020)

From: Vladimir Alexiev <vladimir.alexiev@ontotext.com>
Date: Sun, 18 Oct 2020 12:16:21 +0300
To: Antonin Delpeuch <antonin@delpeuch.eu>
Cc: public-reconciliation@w3.org
Message-ID: <CAMv+wg7rNJr7stUw1jfzOVccYEEexK9r_+oVdRFhccdCwjDXHw@mail.gmail.com>
I agree that recon by id is an important special case. I think it should be
handled just like the other extra properties,  but with a flag indicating
it's an I'd, therefore strong and exact match is expected.

On Fri, Oct 16, 2020, 16:18 Antonin Delpeuch <antonin@delpeuch.eu> wrote:

> Hi Tom,
>
> Thanks a lot! If you remember how namespaced services worked in
> OpenRefine, it could be useful to devise how to offer the same sort of
> functionality via the standard reconciliation API.
>
> For me, the main hurdle is that reconciliation data must be stored in a
> certain column, so if we have multiple columns for properties but no
> column for the name, where should we store this information?
>
> Best,
>
> Antonin
>
> On 11/10/2020 22:14, Tom Morris wrote:
> > A search string definitely shouldn't be mandatory. Omitting the
> > parameter or specifying the empty string for it both seem to be fine
> > solutions.
> >
> > For historical context, the type of strong identifier reconciliation
> > that's being discussed here is what was called a "Namespaced"
> > reconciliation service, which had a separate endpoint, because
> > Freebase didn't use properties to store identifiers. That's why you
> > don't see this use case handled by the standard reconciliation
> > endpoint.
> >
> > In Freebase, each entity just had an arbitrary number of identifiers,
> > which could be used interchangeably.
> > Freebase namespaces are described here:
> >
> https://web.archive.org/web/20160526012918/http://wiki.freebase.com:80/wiki/Namespace
> > but they're basically just a hierarchical path-based namespace tree
> > rooted at /. Some of the interesting namespace were things like:
> > - /wikipedia/en (URL slug)
> > - /wikipedia/en_id (numeric id)
> > - /authority/imdb
> > - /authority/musicbrainz
> >
> > Tom
> >
> > On Wed, Oct 7, 2020 at 9:19 AM Antonin Delpeuch <antonin@delpeuch.eu>
> wrote:
> >> Hi all,
> >>
> >> I realized today that as far as the specs are concerned, it's not too
> >> hard to support this use case, so I gave it a go:
> >>
> >> https://github.com/reconciliation-api/specs/pull/53
> >>
> >> This just makes the textual "query" field optional (but requires at
> >> least a property if it is not supplied).
> >>
> >> Best,
> >>
> >> Antonin
> >>
> >> On 11/11/2019 18:40, Shaw, Ryan wrote:
> >>> Re: reconciling without names, we have another use case, one that does
> not involve identifiers.
> >>>
> >>> The PeriodO project enables reconciliation against a list of scholarly
> definitions of historical periods.[1] Some people want to reconcile a list
> of period names plus additional qualifiers (spatial coverage and temporal
> extent), which works with the current Refine model. But others have only
> those "additional" qualifiers, and no period names. Currently we can't
> support them without some hack of the kind Antonin describes.
> >>>
> >>> Ryan
> >>>
> >>> [1]
> https://github.com/periodo/periodo-reconciler#how-reconciliation-works
> >>>
> >>>> On Nov 10, 2019, at 6:45 AM, Antonin Delpeuch <antonin@delpeuch.eu>
> wrote:
> >>>>
> >>>> Hi all,
> >>>>
> >>>> I would like to discuss one use case that I think the current API does
> >>>> not serve well, I think.
> >>>>
> >>>> The reconciliation API assumes that reconciliation queries contain at
> >>>> least a name (as basic search query). It is then possible to add other
> >>>> constraints (type, properties). However in many situations the user
> does
> >>>> not have any name to supply.
> >>>>
> >>>> The most common situation where this is a problem is when the user has
> >>>> access to a unique identifier, supported by the reconciliation service
> >>>> as a property. Supplying such a unique identifier as a property in a
> >>>> reconciliation query should be enough to identify the candidates, but
> in
> >>>> the current API a name also has to be provided.
> >>>>
> >>>> Currently, for Wikidata reconciliation, unique identifiers have
> priority
> >>>> over the search query, so if you find yourself in this situation you
> can
> >>>> use any random gibberish as name. If there is an exact match via some
> >>>> unique identifier that you supplied, the name will be ignored.
> >>>>
> >>>> Example:
> >>>>
> {"query":"a347682ebf327cbd37e834","properties":[{"pid":"P3500","v":"53616"}]}
> >>>> will return the item with Ringgold identifier "53616" even if the
> query
> >>>> has nothing to do with its name.
> >>>>
> >>>> However that is not extremely intuitive or user-friendly… So in a
> future
> >>>> version of the protocol I would be interested in making those sort of
> >>>> queries more natural.
> >>>>
> >>>> Any thoughts about how it should work?
> >>>>
> >>>> Cheers,
> >>>> Antonin
> >>>>
> >>>>
>
>
>
Received on Sunday, 18 October 2020 09:16:49 UTC