W3C home > Mailing lists > Public > public-lld@w3.org > March 2011

Re: Question about MARCXML to Models transformation

From: Antoine Isaac <aisaac@few.vu.nl>
Date: Sun, 06 Mar 2011 19:25:36 +0100
Message-ID: <4D73D1A0.1010808@few.vu.nl>
To: public-lld@w3.org
Hi Karen,

>> My feeling is that some of these "attributes" (owl:DatatypeProperty)
>> SHOULD be modeled as relationships/associations instead
>> (owl:ObjectProperty). For example, I think "publishers" should be
>> modeled as a frbr:CorporateBody (or a subclass thereof) and "place of
>> publication" should be modeled as frbr:Place.
> This would require a change to the cataloging rules and a change to the meaning of those particular data elements. The library data elements do NOT represent the entities publisher and place of publication, even though they call them that. The library data elements are a transcription from the title page that creates a surrogate for the title page within the bibliographic record. The actual publisher and the actual place of publication are not recorded in library data. This is kind of a cheat, I have to say, because in fact these transcribed strings are often very good hints to the identity of the entities in question, but cataloging does not go that additional step to close the gap between the text and the entities. When the suggestion is made on lists with numerous catalogers in attendance there is quite a bit of push-back based on the perceived additional labor required. Of course, clever systems could reduce this labor through algorithms that make good guesses based on

> the text provided. And I suspect that publisher data may already have these as actual entities.
> This brings up yet another thing that always jumps out at me when I look at MARC data. The "DATA" aspect of MARC -- the many fixed fields and some of the 0XX code fields -- is in *addition to* the text generated based on the cataloging rules. In other words, the cataloging rules (AACR2) do NOT specify any data fields. This came into our realm solely through MARC. RDA does recognize that some values may have datatypes (fairly few, though), and that identifiers MAY be used, but even RDA is expressed mainly as the creation of text (e.g. use "pages" instead of "p."). I'm rather stuck at the moment on how we will get from the description and guidance in the RDA text to something that represents linkable things. I guess this could become one of the issues for our list of issues: the need to move library data away from text and toward data objects.

Yes, and a crucial one! With such a move, library data will be much more useful for the linked data crowds.

Out of my head I can foresee two scenarios:
1. change of cataloging behaviour, with catalogers really connecting books and other resources to (identifiers of) real world entities.
2. keep the cataloging behaviour, but try to have the text values (or authorities, or whatever name for proxies we want to come with ;-) ) related to real-world entities. A bit like what VIAF is trying to do--starting from library data alone, though.

I understand that 2 would be probably more realistic for general concepts in thesauri. In fact if you consider a "subject field" which can can contain as various and/or fuzzy notions as "peasants", "freedom", "middle ages", etc, there might not even be a workable solution.
But I'm really wondering why 1 would not be possible for quite easily identifiable entities like places and persons. With some basic tools that use existing linked data sources like Geonames, you could easily get something like a "did you mean http://sws.geonames.org/2988507/?" question (with a better interface, of course) that a cataloger can answer by yes or no, when "Paris" is filled in as place of publication.
That would maybe not always work--especially for the "Paris" that are not the true one in my heart :-p. But in many cases that would still be useful, and appropriately exploit what catalogers are doing, I believe. I mean, the ones I have met were smart people, they must think of the place itself while filling in the name of it. That seems a quick win. Or am I really wrong?


Received on Sunday, 6 March 2011 18:25:04 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Sunday, 6 March 2011 18:25:04 GMT