Re: Meaning of identifierSpace and schemaSpace

Hi all—been lurking here a bit and have been encouraged to jump in and participate.

I agree with the “Why” question.  At the Getty, we’re using links to our documentation for our URL structure and our type structure, but that’s pretty arbitrary. (Our enpoint is at http://services.getty.edu/vocab/reconcile).  We had a discussion about what to put there, and in the absence of a rationale, we put what we thought would be most helpful to a user trying to understand our schema space.

The questions I would ask, to help understand what goes here would be: Does OpenRefine actually process these IDs in any way?  Are we aware of any client that does?  And is the point here to just provide data hooks for extensibility, or is there anything else it does?



— David

David Newbury
Enterprise Software Architect, Getty Digital

Email: dnewbury@getty.edu<mailto:dnewbury@getty.edu>
Phone: (310) 440-6116

From: David Huynh <dfhuynh@gmail.com>
Date: Thursday, September 26, 2019 at 4:34 PM
To: Antonin Delpeuch <antonin@delpeuch.eu>
Cc: Thad Guidry <thadguidry@gmail.com>, "public-reconciliation@w3.org" <public-reconciliation@w3.org>
Subject: Re: Meaning of identifierSpace and schemaSpace
Resent-From: <public-reconciliation@w3.org>
Resent-Date: Thursday, September 26, 2019 at 4:33 PM

Hi Antonin,

That sounds good to me, though the reader might still ponder, "why?" So maybe another sentence or two would provide a rationale. We could also note that while identifierSpace and schemaSpace are URIs, the IDs in those spaces (such as Freebase IDs) may not be URIs.

Best,

David


On Thu, Sep 26, 2019 at 12:03 AM Antonin Delpeuch <antonin@delpeuch.eu<mailto:antonin@delpeuch.eu>> wrote:

Hi David,

Thank you very much for chiming in!

In our specs, I think it would be useful if we could formulate the definition of these parameters without any reference to Freebase. So perhaps something fairly generic, such as "This pair of URI defines the identifier scheme used by the reconciliation service.", or something along these lines?

Antonin
On 26/09/2019 03:50, David Huynh wrote:
Thad et al.,

My memory is also fading, but as far as I can recall, the intention is that, since Freebase's identifier scheme (for both instances and schemas) is only one possible way of formulating identifiers, we have to allow for other identifier schemes in order to make the reconciliation API extensible and future-proof. identifierSpace and schemaSpace specify what schemes instance identifiers and schema identifiers follow so that software like OpenRefine can programmatically process those identifiers. For example, given a type ID in the Freebase's schema ID space, e.g. /people/person, OpenRefine can remove the last segment including the last slash / and infer the ID of the domain containing that type, namely /people. Other identifier schemes may or may not have such path-based syntax.

Let me know if that makes sense.

Thanks,

David

On Wed, Sep 25, 2019 at 4:57 AM Thad Guidry <thadguidry@gmail.com<mailto:thadguidry@gmail.com>> wrote:
David,

(sorry for pulling you into this convo)

Do you happen to have any additional knowledge or clarification for the definitions of identiferSpace and schemaSpace in Reconciliation ? (my memory is fading fast)

Thad
https://www.linkedin.com/in/thadguidry/



On Tue, Sep 24, 2019 at 9:07 AM Thad Guidry <thadguidry@gmail.com<mailto:thadguidry@gmail.com>> wrote:
Historically in Freebase the Types themselves were grouped under Domains sometimes.

We had Domains like /business and /music etc.
and each Type in the Domain (like "artist") had an ID that showed which domain it was under like this Type "/music/artist".

Then the more general concept was namespaces and keys
https://developers.google.com/freebase/guide/basic_concepts


Wikidata Items directly translate to Freebase Topic MIDs ( like /m/0dgw9r<https://tools.wmflabs.org/freebase/m/0dgw9r> )

OId MQL ... but this might help you understand... When you inserted a new value into Freebase you would get back the identifier MID of the triple, in this case upon inserting the key value "eol" into the "/biology" namespace (which is essentially a Domain)

[{
  "id":   "/m/0cnstys",
  "type": "/type/namespace",
  "key": [{
    "connect":   "insert",
    "namespace": "/biology",
    "value":     "eol"
  }]
}]

https://groups.google.com/forum/#!searchin/freebase-discuss/jeff$20prucher$20authority|sort:date/freebase-discuss/qiKeBzTMsks/rDtwPUNDClQJ<https://groups.google.com/forum/#!searchin/freebase-discuss/jeff$20prucher$20authority%7Csort:date/freebase-discuss/qiKeBzTMsks/rDtwPUNDClQJ>

https://groups.google.com/forum/#!searchin/freebase-discuss/jeff$20prucher$20authority|sort:date/freebase-discuss/WdwYwZKSLtM/6Qc5EosrG28J<https://groups.google.com/forum/#!searchin/freebase-discuss/jeff$20prucher$20authority%7Csort:date/freebase-discuss/WdwYwZKSLtM/6Qc5EosrG28J>

https://groups.google.com/forum/#!searchin/freebase-discuss/jeff$20prucher$20authority|sort:date/freebase-discuss/OoRk6BsxWyQ/jCXs-PSn3woJ<https://groups.google.com/forum/#!searchin/freebase-discuss/jeff$20prucher$20authority%7Csort:date/freebase-discuss/OoRk6BsxWyQ/jCXs-PSn3woJ>

https://groups.google.com/forum/#!searchin/freebase-discuss/mql$20namespace|sort:date/freebase-discuss/KMQRKrcTYck/zPKBnyncOj4J<https://groups.google.com/forum/#!searchin/freebase-discuss/mql$20namespace%7Csort:date/freebase-discuss/KMQRKrcTYck/zPKBnyncOj4J>

My initial reaction is that the schemaSpace is more about Domains and that the identifierSpace is about namespaces and keys (where we stored unique identifiers and was the old soft key system).
Since Wikidata has no direct Types or Domains, there is nothing to translate there I think.

Thad
https://www.linkedin.com/in/thadguidry/



On Tue, Sep 24, 2019 at 5:15 AM Antonin Delpeuch <antonin@delpeuch.eu<mailto:antonin@delpeuch.eu>> wrote:
Hi all,

I have a question about the exact meaning of the identifierSpace and
schemaSpace fields in reconciliation services. This is one thing I would
like document better in the specifications.

My understanding is that these are URIs which represent the type of
entities and properties used by the service. Can we have a more precise
definition than that?

The canonical value for both of these is
"http://rdf.freebase.com/ns/type.object.id", which was used in the
Freebase reconciliation service, but can of course no longer be accessed
(if it ever was? sometimes URIs are not meant to be resolved in a browser).

This canonical value is used by many other reconciliation services, even
if they do not use Freebase ids at all: for instance, the OCCRP endpoint
uses it. Is it desirable?

In the Wikidata service I have used "http://www.wikidata.org/entity/" as
identifierSpace and "http://www.wikidata.org/prop/direct/" as
schemaSpace. The idea behind this choice was that you can get a RDF URI
for identifiers and properties by concatenating their ids to these
prefixes. But that was totally a guess on my part. Perhaps I should have
used actual URIs there? Which ones?

So, in short, I would like a precise definition of the identifierSpace
and schemaSpace fields, which would unambiguously inform implementers
about what value they should have in their own services.

Thanks,

Antonin



CAUTION: This email originated from outside of the Getty. Do not click links or open attachments unless you verify the sender and know the content is safe.

Received on Monday, 30 September 2019 15:26:26 UTC