Re: WorldCat Works Data

Thanks all for comments and questions.

As explained in the blog post<http://dataliberate.com/2014/02/oclc-preview-194-million-open-bibliographic-work-descriptions/> this is a preview release of what is the first step in an evolving process.  So in that light I will try to briefly answer some of the questions raised.


  *   Will there be links to individual ISBN/ISNI records?
     *   ISBN - ISBNs are attributes of manifestation [in FRBR terms] entities, and as such can be found in the already released WorldCat Linked Data.  As each work is linked to its related manifestation entities [by schema:workExample] they are therefore already linked to ISBNs.
     *   ISNI - ISNI is an identifier for a person and as such an ISNI URI is a candidate for use in linking Works to other entity types.  VIAF URIs being another for Person/Organisation entities which, as we have the data, we will be using.  No final decisions have been made as to which URIs we use and as to using multiple URIs for the same relationship.  Do we Use ISNI, VIAF, & Dbpedia  URIs for the same person, or just use one and rely on interconnection between the authoritative hubs, is a question still to be concluded.

  *   Will your team be making use of ISTC?
Again it is still early for decisions in this area.  However we would not expect to store the ISTC code as a property of Work
ISTC is one of many work based data sets, from national libraries and others,  that it would be interesting to investigate processes for identifying sameAs relationships between.

  *   I don't see anything that describes the criteria for "workness."
“workness” definition is more the result of several interdependent algorithmic decision processes than a simple set of criteria.  To a certain extent publishing the results as linked data was the easy (huh!) bit.  The efforts to produce these definitions and their relationships are the ongoing results of a research process that has been in motion for several years, to investigate and benefit from FRBR.  You can find more detail behind this research here: http://www.oclc.org/research/activities/frbr.html?urlm=159763

As we move on from ‘preview release’ and establish this data, I expect to see some more blogging and other information explaining some of the background.

  *   Can you say more about how the stable identifiers will be managed?
You correctly identify the issue of maintaining identifiers as work groups split & merge.  This is one of the tasks our development team are currently working on as we move towards full release of this data over the coming weeks.  As I indicated in my blog post, there is a significant data refresh due and from that point any changes will  be hadled correctly.

  *   Is there a bulk download available?
No there is no bulk download available.  This is a deliberate decision for several reasons.
Firstly this is Linked Data - its main benefits accrue from its canonical persistent identifiers and the relationships it maintains between other identified entities within a stable, yet changing, web of data.  WorldCat.org<http://WorldCat.org> is a live data set actively maintained and updated by the thousands of member libraries, data partners, and OCLC staff and processes. I would discourage reliance on local storage of this data as will rapidly evolve and become out of synchronisation with the source.  The whole point and value of persistent identifiers, which you would reference locally, is that they will dereference to the current version of the data.

  *   Where should bugs be reported?
Today, you can either use the comment link from the Linked Data Explorer or report them to data@oclc.org<mailto:data@oclc.org>.  We will be building on this as we move to full release.

  *   There appears to be something funky with the way non-existent IDs are handled.
You have spotted a defect!  - The result of access to a non established URI should be no triples returned with that URI as subject.  How this is represented will differ between serialisations.

  *   Defining what a “work” is has proven next to impossible in the commercial world - very true for the [often political] reasons you imply
OCLC make no broader claim to the definition of a WorldCat Work other than it is the result of applying the results of the FRBR and associated algorithms developed by OCLC Research to the vast collection of bibliographic data contributed, maintained, and shared by the OCLC member libraries and partners.

  *   Clarify for us in the video context
Currently the [FRBR] algorithms that are producing these Work definitions are operating mainly on the bibliographic material descriptions that underpin WorldCat.org<http://WorldCat.org>.  Over time the may be opportunity to expand this to encompass other material types.  The generic capability of the Schema.org<http://Schema.org> CreativeWork Type, coupled with SchemaBibEx proposals for exampleOfWork and workExample, make it simple to envisage the definition of [what are sometimes called] super-works which could associate all versions of a work from the original story through to the game-of-the-cartoon-of-the-movie-of-the-book.  Given time ;-)

  *   How might intersect with the BIBFRAME model? - these work descriptions could be very useful as a bf:hasAuthority for a bf:Work.
The OCLC team monitor, participate in, and take account of many discussions - BIBFRAME, Schema.org<http://Schema.org>, SchemaBibEx, WikiData, etc. - where there are some obvious synergies in objectives, and differences in approach and levels of detail for different audiences. The potential for interconnection of datasets using sameAs, and other authoritative relationships such as you describe is significant.  As the WorldCat data matures and other datasets are published, one would expect initiatives from many in starting to interlink bibliographic resources from many sources.


This has turned into a bit of an essay, that may well become the bases for a blog post, but hopefully has answered some of your questions.  Thanks for the interest.

~Richard

Received on Wednesday, 26 February 2014 10:50:24 UTC