Re: Meaning of identifierSpace and schemaSpace from Antonin Delpeuch on 2019-10-01 (public-reconciliation@w3.org from October 2019)

From: Antonin Delpeuch <antonin@delpeuch.eu>
Date: Tue, 1 Oct 2019 09:55:09 +0200
To: David Huynh <dfhuynh@gmail.com>, Thad Guidry <thadguidry@gmail.com>
Cc: David Newbury <DNewbury@getty.edu>, "public-reconciliation@w3.org" <public-reconciliation@w3.org>
Message-ID: <7f523269-7896-3419-60a8-103893c23172@delpeuch.eu>
Hi both,

This is extremely useful David H. and David N.! Thanks a lot both.

To add to what Thad wrote, the identifierSpace and schemaSpace are
currently used in OpenRefine as follows. When you reconcile a column to
a service, the identifierSpace and schemaSpace are stored in the column
metadata (along with the service URL). This attribute can be relied on
by other operations, although on top of my mind I cannot recall any
operation doing that. The only place I am aware of where this is used is
the Wikidata extension, which needs to check that a given column is
reconciled to Wikidata to allow its use in a Wikidata schema. This is
done by checking for the identifierSpace and schemaSpace on column
metadata. This means that it is possible for users to use another
Wikidata reconciliation service (not necessarily the one I run at
https://tools.wmflabs.org/openrefine-wikidata/ ) and still be able to
use the reconciliation results seamlessly in the Wikidata extension,
assuming they use the same identifierSpace and schemaSpace.

Antonin

On 01/10/2019 04:13, David Huynh wrote:
> If I recall correctly, the original intention is to provide a hook not
> for extensibility per se, but for future formalization.
>
> Freebase IDs did not follow the URI syntax, and in order to also
> accommodate other data stores' ID schemes (which don't necessarily
> follow the URI syntax, either), I left the syntax for instance IDs and
> type IDs unspecified in the API, meaning "anything goes." So,
> "/people/person" is a fine schema ID and "/m/29xy1" is a fine instance
> ID even if they don't have the URI syntax.
>
> However, I didn't want to leave their syntax entirely "anything goes",
> either, so I introduced those 2 fields identifierSpace and schemaSpace
> as references to whatever resources that can specify what the syntax
> for the instance IDs and type IDs is. Values filling those 2 fields
> are required to be URIs.
>
> I think at this time, the Recon API can be refined further to
> formalize what those URIs should provide, e.g., some XML documents
> describing the syntax of instance IDs and type IDs. I did not really
> know what those URIs should provide 10 years ago, but perhaps now, the
> community has had enough experience with many existing recon services
> that we can do this formalization.
>
> Thanks,
>
> David
>
> On Mon, Sep 30, 2019 at 10:09 AM Thad Guidry <thadguidry@gmail.com
> <mailto:thadguidry@gmail.com>> wrote:
>
>     Hi David Newbury,
>
>     OpenRefine does not process these IDs currently.  But users are
>     certainly able to do things with the fields, even if they manually
>     form GET requests to Recon endpoints.  I am not sure how much
>     metadata is exposed currently that users are able to interact with
>     in OpenRefine for Recon services.  "cell" in a Recon object shows
>     all the fields currently exposed.  Perhaps Antonin can summarize
>     that for us.
>
>     The idea of the IDs for these 2 fields is that the IDs sometimes
>     hold information...like metadata in a way.  They become useful I
>     would say for collaborative efforts sometimes, as is often the
>     case with Linked Data.
>     Bits of info and hints of categories, domains, general
>     classifications sometimes show up in those URI or IDs, as they did
>     with Freebase in particular.
>
>     My definitions for these 2 fields in Reconciliation API would
>     begin to look like this...
>
>     schemaSpace :  A URI or ID that represents a group of entities by
>     some Domain or Schema, ex: "/usa/people/person", or
>     "006_electronic_journals" or simply "football"
>
>     identifierSpace: A URI or ID that represents a group??? of
>     entities by some ???
>
>     Thad
>     https://www.linkedin.com/in/thadguidry/
>
>
>     On Mon, Sep 30, 2019 at 10:26 AM David Newbury <DNewbury@getty.edu
>     <mailto:DNewbury@getty.edu>> wrote:
>
>         Hi all—been lurking here a bit and have been encouraged to
>         jump in and participate.
>
>          
>
>         I agree with the “Why” question.  At the Getty, we’re using
>         links to our documentation for our URL structure and our type
>         structure, but that’s pretty arbitrary. (Our enpoint is at
>         http://services.getty.edu/vocab/reconcile).  We had a
>         discussion about what to put there, and in the absence of a
>         rationale, we put what we thought would be most helpful to a
>         user trying to understand our schema space.
>
>          
>
>         The questions I would ask, to help understand what goes here
>         would be: Does OpenRefine actually process these IDs in any
>         way?  Are we aware of any client that does?  And is the point
>         here to just provide data hooks for extensibility, or is there
>         anything else it does?
>
>          
>
>          
>
>          
>
>         — David 
>
>          
>
>         David Newbury
>
>         Enterprise Software Architect, Getty Digital
>
>          
>
>         Email: dnewbury@getty.edu <mailto:dnewbury@getty.edu>
>
>         Phone: (310) 440-6116
>
>          
>
Received on Tuesday, 1 October 2019 07:55:39 UTC