Re: Data extension protocol and linking to other reconciliation services

On 26/06/2020 20:40, Tom Morris wrote:
>
> On Wed, Jun 24, 2020 at 9:05 AM Antonin Delpeuch <antonin@delpeuch.eu
> <mailto:antonin@delpeuch.eu>> wrote:
>
>
>     At the moment, the Wikidata reconciliation service is not aware of
>     that:
>     when it returns GND ids, it returns them as bare strings. The user
>     must
>     know that there is a GND service to extend their data further.
>
>
> Why wouldn't it return (at least optionally) URIs using formatter URI
> for RDF resource
> <https://www.wikidata.org/wiki/Property:P1921> property for the GND
> property <https://www.wikidata.org/wiki/Property:P227> instead of just
> bare strings? That would allow the client to dereference them for more
> information or search for services which had additional information.

It would totally be possible to add an option to return RDF URIs for
fetched identifiers. But obtaining data by fetching RDF from individual
URIs is not very efficient or user-friendly. As far as OpenRefine is
concerned, if users can have reconciled cells that they can further
reuse with the data extension service, it would be much easier I think.

>
>     I am thinking that it would be nice if the Wikidata service could
>     announce the GND service in the column metadata it returns, such that
>     the resulting column is automatically considered as reconciled to that
>     service when it is fetched in OpenRefine.
>
>
> Although Wikidata currently has a list of third-party formatter URL
> <https://www.wikidata.org/wiki/Property:P3303>, which are semantically
> similar to a reconciliation service, it fundamentally seems like the
> wrong place to serve as a reconciliation service registry.

With this proposal there would not need to be a reconciliation service
registry: each service would be free to reference related reconciliation
services. For the Wikidata reconciliation service, the correspondance
between Wikidata identifier properties and foreign reconciliation
endpoints could be stored in Wikidata itself, but would not have to (the
reconciliation service could fetch it from its own configuration file,
or any other source).

What is the right place to serve a reconciliation service registry in
your opinion?

>
>     Currently, reconciliation services can already return reconciled
>     values
>     in the data extension protocol, but it is assumed that these are
>     reconciled values for the same service. They can announce which
>     type of
>     entities this corresponds to, but that is assumed to be a type in the
>     same reconciliation service.
>
>
> That seems like a bad assumption to bake in. Types should be global
> (ie URIs), not associated with a single reconciliation service.

Entities, properties and types have been service-specific so far (at
least that is how I understand the specs we have so far). We can change
this, but that is a distinct discussion I think.

>
>     We could potentially extend that metadata to include a reconciliation
>     service URI, such that the resulting column would be reconciled to
>     this
>     other service. It would provide a sort of federation mechanism between
>     reconciliation endpoints.
>
>
> It seems like the federation should be based on services advertising
> things that they know about themselves, not asserting things about
> others that they don't control. If I've got full URIs instead of raw
> strings, then I should be able to figure out where to go for more
> information.

Are you proposing to replace the data extension service by something
entirely based on de-referencing RDF URIs? It seems complicated to me to
replicate the existing user experience if we go down that road…

Antonin

Received on Saturday, 27 June 2020 15:52:56 UTC