Re: A better solution for legacy IDs?

Hi,

Trying to see if Karen's problem is the same as mine...

In Europeana we're also searching for getting "legacy" identifiers from our collection providers. for the moment we're indeed having URIs (taking our LOD pilot) of the form
http://data.europeana.eu/item/09405f/533F9A826CB038D02C05A9814CF97E5D1B49BBEE
or
http://data.europeana.eu/item/92056/BD9D5C6C6B02248F187238E9D7CC09EAF17BEA59
where 92056 is a collection id and BD9D5C6C6B02248F187238E9D7CC09EAF17BEA59 a hash generated from the record.

The collection ID is not ideal, but if we want to change it, we can do it. What we need real help from our providers is changing the hash, which comes from a very long time ago, and which many of us at Europeana are desperate about.

For this we'd like to get an identifier from our providers, which makes sense to them, and which we'd like to use to create a URIs/Ls. A process a bit similar to what Herbert, Jeff and Tom describe, except that we're not asking for already-made URIs. I mean, we'll ask for them, but we should also allow for local identifiers, which are good to compose URIs (such as inventory numbers). In the line of what ICOM recommends at [1].

dc:identifier could do it, in principle. But sometimes it is being mis-used, and many time we get several of them (all legitimate) for one object.
We may also try to investigate bibo:ISBN and other domain-specific properties, but we now this would fail, given the variety of collections we have.

So for the moment we would just create our own sub-property of dc:identifier, e.g., europeana:localIdentifier. Of course if there's an alternative I'd be happy to consider it...

The PILIN ontology that Tom has found may include relevant properties, but the doc does not really refer to properties with actual URIs. There was also the IRW ontology, but its id representation mechanism seems limited to URIs.

Cheers,

Antoine

[1] http://www.cidoc-crm.org/URIs_and_Linked_Open_Data.html


> On Dec 12, 2011, at 18:27, "Young,Jeff (OR)"<jyoung@oclc.org>  wrote:
>
>> It's still reasonable to use "info" URIs (RFC 4451) despite fact that new "namespaces" are no longer being considered:
>>
>> http://info-uri.info/registry/OAIHandler?verb=ListRecords&metadataPrefix=oai_dc
>>
>> "info" URIs don't benefit from the HTTP protocol the way Linked Data "http" URIs are, but they call still be used in RDF to an identify an rdf:Resource/owl:Thing.
>
> These kind of non-HTTP identifiers are hugely unpopular now, as far as I understand, but info URI was created to address exactly problem that Karen brings up. Just have a look at the intro of the RFC that specifies info URI: http://www.ietf.org/rfc/rfc4452.txt
>
> Cheers
>
> Herbert Van de Sompel
>
>
>>
>> Jeff
>>
>>> -----Original Message-----
>>> From: Karen Coyle [mailto:kcoyle@kcoyle.net]
>>> Sent: Monday, December 12, 2011 8:01 PM
>>> To: public-lld@w3.org
>>> Subject: Re: A better solution for legacy IDs?
>>>
>>> Quoting Dan Brickley<danbri@danbri.org>:
>>>
>>>
>>>>
>>>> Can you expand on 'too large'? You can fit breathtaking amounts of
>>> data on
>>>> a USB stick - or Web site - these days. What kind of size are we
>>> looking
>>>> at? Is the problem admin/social (eg. Decentralization expected) or
>>>> technical or a mix?
>>>
>>> Dan, I didn't mean "large" in the "bytes" sense but in the sense of
>>> human effort to mint and maintain a unique property for each possible
>>> type of identifier. It just seems easier to me to have an "identifier"
>>> property (or graph) that is a single URI, but which takes the
>>> identifier as a value, along with a code giving the source/agency/etc.
>>> There are institution and organization codes that will probably cover
>>> most of the identifier-producing agencies. In non-linked data we often
>>> see things like "PMID:123456" or "eISSN:2344-8765". This would be the
>>> same, but would be an http URI. I realize that there isn't a great
>>> deal of overhead to minting a URI but my experience is that many folks
>>> will hesitate before doing so. Treating the legacy identifiers as
>>> values will probably get more uptake.
>>>
>>> Admittedly, the edge cases will not be well controlled and we'll get
>>> some identifiers that are expressed in more than one way. That happens
>>> now in the pre-LD world; we'll have to live with that. But at least to
>>> have some agreement on a graph structure would be a step forward, IMO.
>>>
>>> So, Tom, I think that answers your question: I'm mainly looking for a
>>> property/graph that will take values, but I will look more closely at
>>> the Freebase schema. Is it possible to add to the Freebase identifier
>>> hierarchy "at will"? Are there limitations on who can mint a new
>>> property? And for the Freebase namespaces that refer to an http URI
>>> elsewhere (like the LC catalog numbers), where is the connection made
>>> to the URI? I couldn't find that link.
>>>
>>> Thanks,
>>>
>>> kc
>>>
>>>>
>>>> Dan
>>>>
>>>>> Has anyone developed and published a good "legacy identifier graph"
>>> that
>>>> we could adopt? If not, would someone like to propose one?
>>>>>
>>>>> Thanks,
>>>>> kc
>>>>>
>>>>> --
>>>>> Karen Coyle
>>>>> kcoyle@kcoyle.net http://kcoyle.net
>>>>> ph: 1-510-540-7596
>>>>> m: 1-510-435-8234
>>>>> skype: kcoylenet
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Karen Coyle
>>> kcoyle@kcoyle.net http://kcoyle.net
>>> ph: 1-510-540-7596
>>> m: 1-510-435-8234
>>> skype: kcoylenet
>>>
>>>
>>
>

Received on Tuesday, 13 December 2011 09:01:06 UTC