Re: A better solution for legacy IDs?

On Sun, Dec 11, 2011 at 1:50 PM, Karen Coyle <kcoyle@kcoyle.net> wrote:
> I keep running into the same problem in different projects: we've got a
> bunch of legacy identifiers, like ISBNs, PMIDs, OCLC numbers, etc. It's
> important to carry them in the linked data that we are creating, but the
> maintenance agencies haven't provided them with URIs. That means we need
to
> keep the base identifier string along with something that, well,
identifies
> the identifier. I know that BIBO has BIBO:ISBN, etc., but it's just not
> going to work to create a separate property for each one of these, the
> number of them is too large.
>
> Has anyone developed and published a good "legacy identifier graph" that
we
> could adopt? If not, would someone like to propose one?

I'm not sure what you're asking for here.  Are you looking for an ontology
which can be used to describe the identifiers or an actual set of
identifiers from multiple sources which have been matched up with each
other?

The way Freebase schema works is to define a set of namespaces to hold the
identifiers, one per type of identifier or authority, and a few simple
properties such as whether the identifier is unique, a URI template which
can be use to generate a link from the ID (usually to HTML, not RDF, but
still useful to link back to the original source).  These namespaces are
rooted at /authority, so, for example, the LC Subject Headings and Name
Authorties are under
/authority/us/gov/loc<http://www.freebase.com/inspect/authority/us/gov/loc>,
but the URIs are generated independently from this hierarchy, so they can
track any changes that are made at the Library of Congress.  If you're
looking to create an ontology to cover these types of identifiers, you
could do worse for a starting point.  It's proven to work fairly well.
There's a separate set of definitions at sameas.freebase.com which are used
to generate the sameAs triples on rdf.freebase.com, but also have
properties for things like the authority in charge of assigning the
identifiers, etc.

A quick Google search also found the 2007 vintage PILIN Ontology for
Identifiers and Identifier Services
<http://www.pilin.net.au/Project_Documents/PILIN_Ontology/Ontology.htm>but
I know nothing more about it.

If you're looking for an actually source of reconciled identifiers,
Freebase has a large collection of ISBN, LC NAF, VIAF, Wikipedia,
OpenLibrary, ISFDB, MusicBrainz (spoken word recordings) and other
identifiers (240 different
types<http://www.freebase.com/view/base/sameas/views/web_id_collection>of
IDs altogether.  Number and quality of links varies quite a bit across
entity types and identifier sources, but it'd probably be hard to beat for
diversity, plus it's more liberally licensed than proprietary databases
like those of the OCLC.  Of course, OpenLibrary's data is even more
liberally licensed than Freebase, but not as high quality as Freebase or
WorldCat, so the right set of tradeoffs will really depend on your
priorities.

If you weren't looking for either of these things, I apologize for taking
up your time!

Tom

Received on Monday, 12 December 2011 23:27:30 UTC