W3C home > Mailing lists > Public > public-lld@w3.org > March 2011

RE: Question about MARCXML to Models transformation

From: Karen Coyle <kcoyle@kcoyle.net>
Date: Sun, 06 Mar 2011 08:54:36 -0800
Message-ID: <20110306085436.20647fu7rgb48xn0@kcoyle.net>
To: "Young,Jeff (OR)" <jyoung@oclc.org>
Cc: public-lld@w3.org
Quoting "Young,Jeff (OR)" <jyoung@oclc.org>:

> I think Karen brings some nebulous issues into focus. Sorry if my
> thoughts are cryptic. I can try to clarify them if needed.

Thanks, Jeff. I have just a few comments...



>
> Riddle: When is a resource not a resource?
> Answer: When the modeler(s) declare it to be a property or set of
> properties instead.

This may not be what you intended, but there is something that I have  
noticed that is a big difference between library data models  
(including FRBR) and how I see non-librarians modeling the same data.  
If you look at BIBO [1] or FABiO
[2] you see that they have a large number of classes, but few data  
properties. Library data tends to be modeled as a small number of  
classes, but many data properties. For example, BIBO and FABiO have  
numerous classes for types of texts: review article, academic article,  
abstract, etc. MARC and RDA have the concept of "genre" that is a data  
property with a controlled list of values. In library data "genre" is  
not a thing.

Somewhere there must be a reason why you would choose one of these  
methods over the other. Something like:
  - if it needs to have specific relationships to other things, then  
model it as a thing
  - if it needs to have specific properties to describe it, then model  
it as a thing

See what I mean?

I can't tell if the non-library folk have gone overboard with classes,  
or if libraries have gone "underboard" with them. Surely that has to  
be a functional way to determine the best level of definition for your  
metadata.

[1] http://bibotools.googlecode.com/svn/bibo-ontology/trunk/doc/index.html
[2] http:/purl.org/spar/fabio

> My feeling is that some of these "attributes" (owl:DatatypeProperty)
> SHOULD be modeled as relationships/associations instead
> (owl:ObjectProperty). For example, I think "publishers" should be
> modeled as a frbr:CorporateBody (or a subclass thereof) and "place of
> publication" should be modeled as frbr:Place.

This would require a change to the cataloging rules and a change to  
the meaning of those particular data elements. The library data  
elements do NOT represent the entities publisher and place of  
publication, even though they call them that. The library data  
elements are a transcription from the title page that creates a  
surrogate for the title page within the bibliographic record. The  
actual publisher and the actual place of publication are not recorded  
in library data. This is kind of a cheat, I have to say, because in  
fact these transcribed strings are often very good hints to the  
identity of the entities in question, but cataloging does not go that  
additional step to close the gap between the text and the entities.  
When the suggestion is made on lists with numerous catalogers in  
attendance there is quite a bit of push-back based on the perceived  
additional labor required. Of course, clever systems could reduce this  
labor through algorithms that make good guesses based on the text  
provided. And I suspect that publisher data may already have these as  
actual entities.

This brings up yet another thing that always jumps out at me when I  
look at MARC data. The "DATA" aspect of MARC -- the many fixed fields  
and some of the 0XX code fields -- is in *addition to* the text  
generated based on the cataloging rules. In other words, the  
cataloging rules (AACR2) do NOT specify any data fields. This came  
into our realm solely through MARC. RDA does recognize that some  
values may have datatypes (fairly few, though), and that identifiers  
MAY be used, but even RDA is expressed mainly as the creation of text  
(e.g. use "pages" instead of "p."). I'm rather stuck at the moment on  
how we will get from the description and guidance in the RDA text to  
something that represents linkable things. I guess this could become  
one of the issues for our list of issues: the need to move library  
data away from text and toward data objects.

kc




-- 
Karen Coyle
kcoyle@kcoyle.net http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet
Received on Sunday, 6 March 2011 16:55:12 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Sunday, 6 March 2011 16:55:12 GMT