Re: Open Library and RDF from Thomas Baker on 2010-08-16 (public-lld@w3.org from August 2010)

From: Thomas Baker <tbaker@tbaker.de>
Date: Sun, 15 Aug 2010 22:21:32 -0400
To: Karen Coyle <kcoyle@kcoyle.net>
Cc: "gordon@gordondunsire.com" <gordon@gordondunsire.com>, "Young,Jeff (OR)" <jyoung@oclc.org>, public-lld@w3.org
Message-ID: <20100816022132.GA4552@octavius>
On Sun, Aug 15, 2010 at 05:10:29PM -0700, Karen Coyle wrote:
>               The maximal ontological commitment definitely furthers  
> the sharing, and may be essential for it. In fact, libraries today are  
> organized in highly complex networks of data sharing that they depend  
> on for their survival in this difficult economic climate. Although it  
> is a bit of an exaggeration, I often say that libraries have achieved  
> an amazing efficiency -- that a book is published, then cataloged and  
> the data keyed once (usually by the relevant national library), and  
> every other library downloads that data into their catalog. There is  
> much more metadata re-use than metadata creation.

(I sometimes wonder if it is still optimally efficient, in
2010, to create lots of redundant copies of catalog records
in lots of local databases instead of just linking to a
central record, but that would be a different discussion...)

> I think we must start with that as our reality, and look at ways to  
> integrate library metadata into a wider universe without losing the  
> underlying precision of that data as it exists in library systems.  

Agreed. I'm not questioning the need for precise metadata.
My point is that precision can be attained in different ways.
One way is by defining a strongly specified ontology that
enforces the logical consistency of data.  It will enforce
that not just for data producers but also for data consumers.

Another way is by strongly controlling the consistency of
data when it is created -- e.g., with application profiles,
using criteria that can form the basis of syntactic validation,
quality control, and consistency checks (and of course with
training of the catalogers in the proper application of the
conceptual system).  However, for the data to be good and
consistent, it does not follow that the underlying vocabularies
themselves must necessarily carry heavy ontological baggage.

> A second goal is to increase the interaction of library data with  
> other resources on the Web. This is one of the reasons why the  
> Metadata Registry created a hierarchy of properties, the highest level  
> of which are not bound by the FRBR entities. This allows data to be  
> exchanged without regard to strict FRBR definitions. The resulting  
> metadata, however, will still be more detailed than is desired (or  
> even understood) by non-library communities. Therefore I think we need  
> to work on defining classes and properties that can be used to  
> integrate library data to non-library, non-specialist resources and  
> services. FRBR and RDA jump right into the level of detail that the  
> library community relates to without looking at how those details  
> might fit into a larger picture. We need to work on developing that  
> picture.

I agree that this is the challenge, and a layered approach
sounds reasonable.  Is this the approach currently being followed
by the FR and RDA committees?

> What this all comes down to is that if we take the view that library  
> metadata must embrace different principles than it does today in order  
> for libraries to interact on the Web, then we've got a non-starter.  
> Library data is precise, meets particular needs, is principles based,  
> and is anything but arbitrary. 

As I see it, record sharing in the library world has been
based on the sort of validation that one might express in
an application profile, and the consistency of intellectual
approach embodied in those records has been ensured by
the training of experts in the proper application of
internationally recognized standards.  I do not see this
changing.

My point is that it is not necessarily strongly specified
ontologies that will buy that precision, whereas strongly
specified ontologies _will_ impose their ontological baggage
on any downstream consumers of that data.  Where should
the precision get defined and enforced -- in the process
of creating and validating data, or does it get hard-wired
into the underlying vocabularies themselves?  Designing an
RDF vocabulary need not be like designing an XML schema --
the RDF approach offers different ways to separate underlying
semantics from specific constraints.

My question is whether the FR and RDA process is considering
that some of the desired precision might be defined not in
the underlying vocabularies, but in application profiles that
use those vocabularies.  An approach which pushes some of the
precision into application profiles could provide flexibility
without sacrificing rigor.  Are application profiles (possibly
under a different name) an important part of the discussion?

Tom

-- 
Thomas Baker <tbaker@tbaker.de>
Received on Monday, 16 August 2010 02:22:12 UTC