AW: Question on press use cases from Neubert Joachim on 2010-09-19 (public-lld@w3.org from September 2010)

From: Neubert Joachim <J.Neubert@zbw.eu>
Date: Sun, 19 Sep 2010 07:43:55 +0200
To: "Antoine Isaac" <aisaac@few.vu.nl>, "public-lld" <public-lld@w3.org>
Message-ID: <3A59BB6451C972429019B12996F92DAD02E4D5E8@frodo.zbw-nett.zbw-kiel.de>

Hi Antoine,

thank you for asking your question again, and sorry for don't getting you in the telecon. I'll happily try to explain the strange beast press archives. (For me, working for more than 20 years with press archives, it's maybe too familiar and self-evident - while RDA and FRBR are sometimes still looking quite strange to me ...)

I'm talking here about a classical newspaper clippings archive like 20th Century Press Archives. This is different from Eds use case, which deals with complete issues and pages of the newspaper. In a clippings archives, you find large numbers of single clippings pasted on loose sheets of paper, with the publication date and the newspaper title scribed or stamped onto the sheet. These sheets (and ocassionally other material, like annual reports of companies) were collected in thematic folders, year over year, normally putting the most current clipping on top of the pile. In the past, there was no possibility to access clippings by author, by title, or by any other attribute. The folders were arranged simply alphabetically (wherever possible), or in some kind of classification (we'll have to delve into this for the subject and wares archives). Generally, there exists no card catalog, and there is no such concept as a "bibliographic unit". It would have been much too expensive to capture for every single clipping, and it wasn't applied to folders either. What mattered was the collection, and within the collection the order of folders on the shelf and the order of clippings within the folder. (That's why OAI-ORE with its quite generic concept of an aggregation looked like a natual fit to me here.)

Anyway, legacy metadata is almost non-existant. For some 20,000 clippings we have additional metadata now, but this was transcribed from the sheets in the process of digitizing the material, and I doubt that it's affordable to add much more. Maybe users could someday add newspaper titles and publication dates for clippings they actually work with - but this will remain to be very sparse, regarding the size of the complete archives with its 30 million documents.

The good news is that we *can* apply metadata to the folder level. We did this for the personal name authority identifier, and can therefore pull in data from there and - via a DBpedia mapping - from the Linked Data Cloud. Doing the same for companies would be great. And the other way arround, we add a lot to the Cloud: These thematic folders are a unique source of historical knowledge and contemporary points of view about almost every issue that was discussed publicly in the 20th century.

In library land this kind of material will remain a very special case. But it is part of our cultural heritage, so I think we have to make it accessible with the best methods we can figure out.

Cheers, Joachim

-----Ursprüngliche Nachricht-----
Von: public-lld-request@w3.org [mailto:public-lld-request@w3.org] Im Auftrag von Antoine Isaac
Gesendet: Samstag, 18. September 2010 16:42
An: public-lld
Betreff: Question on press use cases

Hi Ed, Joachim,

I'm posting the question on your two use cases [1,2] I could not really ask in last week's telecon [3].

The data that is published in your cases is pretty much semantic web-oriented, mostly looking at the vocabularies you use: DC, OAI-ORE, FOAF, EXIF, BIBO. There's some RDA/FRBR at [1] but not much. And [2] links to METS records, but rather as a side resource, not a true linked data description.

I'm myself pretty happy with that situation--I trust this can be really useful data as such already. But with my LLD hat on I'd like to know more ;-)

So the question is whether the current situation results rather from:
- a conscious choice of ignoring part of the legacy data you had in the original data sources, in the light of the requirements of your scenarios?
- a too great effort needed to move legacy data to linked data, considering the resources you had?
- the lack of legacy data--you just converted all what you had?

Cheers,

Antoine

[1] http://www.w3.org/2005/Incubator/lld/wiki/Use_Case_Publishing_20th_Century_Press_Archives
[2] http://www.w3.org/2005/Incubator/lld/wiki/Use_Case_NDNP
[3] http://www.w3.org/2005/Incubator/lld/minutes/2010/09/16-lld-minutes.html

Received on Sunday, 19 September 2010 05:44:31 UTC