- From: Tom Morris <tfmorris@gmail.com>
- Date: Mon, 12 Dec 2011 18:27:01 -0500
- To: Karen Coyle <kcoyle@kcoyle.net>
- Cc: public-lld <public-lld@w3.org>
- Message-ID: <CAE9vqEH0P59Ma5N9aZk+eBygV4LRtzWAyPe=fDM+uff9c4CgOw@mail.gmail.com>
On Sun, Dec 11, 2011 at 1:50 PM, Karen Coyle <kcoyle@kcoyle.net> wrote: > I keep running into the same problem in different projects: we've got a > bunch of legacy identifiers, like ISBNs, PMIDs, OCLC numbers, etc. It's > important to carry them in the linked data that we are creating, but the > maintenance agencies haven't provided them with URIs. That means we need to > keep the base identifier string along with something that, well, identifies > the identifier. I know that BIBO has BIBO:ISBN, etc., but it's just not > going to work to create a separate property for each one of these, the > number of them is too large. > > Has anyone developed and published a good "legacy identifier graph" that we > could adopt? If not, would someone like to propose one? I'm not sure what you're asking for here. Are you looking for an ontology which can be used to describe the identifiers or an actual set of identifiers from multiple sources which have been matched up with each other? The way Freebase schema works is to define a set of namespaces to hold the identifiers, one per type of identifier or authority, and a few simple properties such as whether the identifier is unique, a URI template which can be use to generate a link from the ID (usually to HTML, not RDF, but still useful to link back to the original source). These namespaces are rooted at /authority, so, for example, the LC Subject Headings and Name Authorties are under /authority/us/gov/loc<http://www.freebase.com/inspect/authority/us/gov/loc>, but the URIs are generated independently from this hierarchy, so they can track any changes that are made at the Library of Congress. If you're looking to create an ontology to cover these types of identifiers, you could do worse for a starting point. It's proven to work fairly well. There's a separate set of definitions at sameas.freebase.com which are used to generate the sameAs triples on rdf.freebase.com, but also have properties for things like the authority in charge of assigning the identifiers, etc. A quick Google search also found the 2007 vintage PILIN Ontology for Identifiers and Identifier Services <http://www.pilin.net.au/Project_Documents/PILIN_Ontology/Ontology.htm>but I know nothing more about it. If you're looking for an actually source of reconciled identifiers, Freebase has a large collection of ISBN, LC NAF, VIAF, Wikipedia, OpenLibrary, ISFDB, MusicBrainz (spoken word recordings) and other identifiers (240 different types<http://www.freebase.com/view/base/sameas/views/web_id_collection>of IDs altogether. Number and quality of links varies quite a bit across entity types and identifier sources, but it'd probably be hard to beat for diversity, plus it's more liberally licensed than proprietary databases like those of the OCLC. Of course, OpenLibrary's data is even more liberally licensed than Freebase, but not as high quality as Freebase or WorldCat, so the right set of tradeoffs will really depend on your priorities. If you weren't looking for either of these things, I apologize for taking up your time! Tom
Received on Monday, 12 December 2011 23:27:30 UTC