Re: Meaning of identifierSpace and schemaSpace from Shaw, Ryan on 2019-10-01 (public-reconciliation@w3.org from October 2019)

From: Shaw, Ryan <ryanshaw@unc.edu>
Date: Tue, 1 Oct 2019 17:18:01 +0000
To: David Newbury <DNewbury@getty.edu>
CC: Antonin Delpeuch <antonin@delpeuch.eu>, David Huynh <dfhuynh@gmail.com>, Thad Guidry <thadguidry@gmail.com>, "public-reconciliation@w3.org" <public-reconciliation@w3.org>
Message-ID: <9D86E550-7C22-4D6D-9668-39F80320672C@unc.edu>
If these descriptions are accurate then they seem to roughly correspond to:

void:uriSpace <http://rdfs.org/ns/void#uriSpace> ("A URI that is a common string prefix of all the entity URIs in a void:Dataset") 

void:class <http://rdfs.org/ns/void#class> ("The rdfs:Class that is the rdf:type of all entities in a class-based partition [of a void:Dataset]")

> On Oct 1, 2019, at 10:03 AM, David Newbury <DNewbury@getty.edu> wrote:
> 
> Would it be useful to think of them as such?
>  
> identifierSpace: This property defines the base URL under which results returned by the reconciliation service are located.  For example, all services that reconcile against Wikidata should usehttp://www.wikidata.org/entity/ as their identifierSpace regardless of who hosts that reconciliation service. 
>  
> schemaSpace: This property defines the URL that identifies the type of entity returned by the reconciliation service.  For example, the Getty AAT returns SKOS concepts, so the schemaSpace is defined ashttp://www.w3.org/2004/02/skos/core#Concept. 
>  
> To me, this differenciates between the two nicely: identifierSpace is about defining the data provider/authority and allows for extension points based on that provider; schemaSpace defines the type of result, so all SKOS entities (or CRM, or schema.org, or wikidata) can be treated similarly, regardless of provider.
>  
> Happy for these to be wrong, of course, but given this conversation it’s how I’m now thinking about them.
>  
> — David 
>  
> David Newbury
> Enterprise Software Architect, Getty Digital
>  
> Email: dnewbury@getty.edu
> Phone: (310) 440-6116
>  
> From: Antonin Delpeuch <antonin@delpeuch.eu>
> Date: Tuesday, October 1, 2019 at 12:55 AM
> To: David Huynh <dfhuynh@gmail.com>, Thad Guidry <thadguidry@gmail.com>
> Cc: David Newbury <DNewbury@getty.edu>, "public-reconciliation@w3.org" <public-reconciliation@w3.org>
> Subject: Re: Meaning of identifierSpace and schemaSpace
>  
> Hi both,
> 
> This is extremely useful David H. and David N.! Thanks a lot both.
> 
> To add to what Thad wrote, the identifierSpace and schemaSpace are currently used in OpenRefine as follows. When you reconcile a column to a service, the identifierSpace and schemaSpace are stored in the column metadata (along with the service URL). This attribute can be relied on by other operations, although on top of my mind I cannot recall any operation doing that. The only place I am aware of where this is used is the Wikidata extension, which needs to check that a given column is reconciled to Wikidata to allow its use in a Wikidata schema. This is done by checking for the identifierSpace and schemaSpace on column metadata. This means that it is possible for users to use another Wikidata reconciliation service (not necessarily the one I run at https://tools.wmflabs.org/openrefine-wikidata/ ) and still be able to use the reconciliation results seamlessly in the Wikidata extension, assuming they use the same identifierSpace and schemaSpace.
> 
> Antonin
> 
> On 01/10/2019 04:13, David Huynh wrote:
> If I recall correctly, the original intention is to provide a hook not for extensibility per se, but for future formalization.
>  
> Freebase IDs did not follow the URI syntax, and in order to also accommodate other data stores' ID schemes (which don't necessarily follow the URI syntax, either), I left the syntax for instance IDs and type IDs unspecified in the API, meaning "anything goes." So, "/people/person" is a fine schema ID and "/m/29xy1" is a fine instance ID even if they don't have the URI syntax.
>  
> However, I didn't want to leave their syntax entirely "anything goes", either, so I introduced those 2 fields identifierSpace and schemaSpace as references to whatever resources that can specify what the syntax for the instance IDs and type IDs is. Values filling those 2 fields are required to be URIs.
>  
> I think at this time, the Recon API can be refined further to formalize what those URIs should provide, e.g., some XML documents describing the syntax of instance IDs and type IDs. I did not really know what those URIs should provide 10 years ago, but perhaps now, the community has had enough experience with many existing recon services that we can do this formalization.
>  
> Thanks,
>  
> David
>  
> On Mon, Sep 30, 2019 at 10:09 AM Thad Guidry <thadguidry@gmail.com> wrote:
> Hi David Newbury,
>  
> OpenRefine does not process these IDs currently.  But users are certainly able to do things with the fields, even if they manually form GET requests to Recon endpoints.  I am not sure how much metadata is exposed currently that users are able to interact with in OpenRefine for Recon services.  "cell" in a Recon object shows all the fields currently exposed.  Perhaps Antonin can summarize that for us.
>  
> The idea of the IDs for these 2 fields is that the IDs sometimes hold information...like metadata in a way.  They become useful I would say for collaborative efforts sometimes, as is often the case with Linked Data.
> Bits of info and hints of categories, domains, general classifications sometimes show up in those URI or IDs, as they did with Freebase in particular.
>  
> My definitions for these 2 fields in Reconciliation API would begin to look like this...
>  
> schemaSpace :  A URI or ID that represents a group of entities by some Domain or Schema, ex: "/usa/people/person", or "006_electronic_journals" or simply "football"
>  
> identifierSpace: A URI or ID that represents a group??? of entities by some ???
>  
> Thad 
> https://www.linkedin.com/in/thadguidry/

>  
>  
> On Mon, Sep 30, 2019 at 10:26 AM David Newbury <DNewbury@getty.edu> wrote:
> Hi all—been lurking here a bit and have been encouraged to jump in and participate.
>  
> I agree with the “Why” question.  At the Getty, we’re using links to our documentation for our URL structure and our type structure, but that’s pretty arbitrary. (Our enpoint is at http://services.getty.edu/vocab/reconcile).  We had a discussion about what to put there, and in the absence of a rationale, we put what we thought would be most helpful to a user trying to understand our schema space.
>  
> The questions I would ask, to help understand what goes here would be: Does OpenRefine actually process these IDs in any way?  Are we aware of any client that does?  And is the point here to just provide data hooks for extensibility, or is there anything else it does?
>  
>  
>  
> — David 
>  
> David Newbury
> Enterprise Software Architect, Getty Digital
>  
> Email: dnewbury@getty.edu
> Phone: (310) 440-6116
>  
> 
>  
> CAUTION: This email originated from outside of the Getty. Do not click links or open attachments unless you verify the sender and know the content is safe.
Received on Tuesday, 1 October 2019 17:18:25 UTC