Re: AW: Question on press use cases

Good points, Karen.

By the way there is one project on the topic,
In Europeana we also have some efforts on converting EAD (one archive standard) data to RDF, but as linked data is not (yet?) the focus of these I will bother you with it only if you're interested ;-)



> Joachim,
> What you describe is fairly common in archives, although many of them
> have even less information: all they know is that they have someone's
> papers, and the number of boxes they occupy. So this kind of case is a
> good case study for all of that kind of archival material. The exciting
> thing about archives is that they often have material that does not
> exist anywhere else, so their resources are very valuable, but extremely
> hard to find. It would be great to show that linked data can help make
> archives more visible.
> kc
> Quoting Neubert Joachim <>:
>> Hi Antoine,
>> thank you for asking your question again, and sorry for don't getting
>> you in the telecon. I'll happily try to explain the strange beast
>> press archives. (For me, working for more than 20 years with press
>> archives, it's maybe too familiar and self-evident - while RDA and
>> FRBR are sometimes still looking quite strange to me ...)
>> I'm talking here about a classical newspaper clippings archive like
>> 20th Century Press Archives. This is different from Eds use case,
>> which deals with complete issues and pages of the newspaper. In a
>> clippings archives, you find large numbers of single clippings pasted
>> on loose sheets of paper, with the publication date and the newspaper
>> title scribed or stamped onto the sheet. These sheets (and
>> ocassionally other material, like annual reports of companies) were
>> collected in thematic folders, year over year, normally putting the
>> most current clipping on top of the pile. In the past, there was no
>> possibility to access clippings by author, by title, or by any other
>> attribute. The folders were arranged simply alphabetically (wherever
>> possible), or in some kind of classification (we'll have to delve into
>> this for the subject and wares archives). Generally, there exists no
>> card catalog, and there is no such concept as a "bibliographic unit".
>> It would have been much too expensive to capture for every single
>> clipping, and it wasn't applied to folders either. What mattered was
>> the collection, and within the collection the order of folders on the
>> shelf and the order of clippings within the folder. (That's why
>> OAI-ORE with its quite generic concept of an aggregation looked like a
>> natual fit to me here.)
>> Anyway, legacy metadata is almost non-existant. For some 20,000
>> clippings we have additional metadata now, but this was transcribed
>> from the sheets in the process of digitizing the material, and I doubt
>> that it's affordable to add much more. Maybe users could someday add
>> newspaper titles and publication dates for clippings they actually
>> work with - but this will remain to be very sparse, regarding the size
>> of the complete archives with its 30 million documents.
>> The good news is that we *can* apply metadata to the folder level. We
>> did this for the personal name authority identifier, and can therefore
>> pull in data from there and - via a DBpedia mapping - from the Linked
>> Data Cloud. Doing the same for companies would be great. And the other
>> way arround, we add a lot to the Cloud: These thematic folders are a
>> unique source of historical knowledge and contemporary points of view
>> about almost every issue that was discussed publicly in the 20th century.
>> In library land this kind of material will remain a very special case.
>> But it is part of our cultural heritage, so I think we have to make it
>> accessible with the best methods we can figure out.
>> Cheers, Joachim
>> -----Ursprüngliche Nachricht-----
>> Von: [] Im
>> Auftrag von Antoine Isaac
>> Gesendet: Samstag, 18. September 2010 16:42
>> An: public-lld
>> Betreff: Question on press use cases
>> Hi Ed, Joachim,
>> I'm posting the question on your two use cases [1,2] I could not
>> really ask in last week's telecon [3].
>> The data that is published in your cases is pretty much semantic
>> web-oriented, mostly looking at the vocabularies you use: DC, OAI-ORE,
>> FOAF, EXIF, BIBO. There's some RDA/FRBR at [1] but not much. And [2]
>> links to METS records, but rather as a side resource, not a true
>> linked data description.
>> I'm myself pretty happy with that situation--I trust this can be
>> really useful data as such already. But with my LLD hat on I'd like to
>> know more ;-)
>> So the question is whether the current situation results rather from:
>> - a conscious choice of ignoring part of the legacy data you had in
>> the original data sources, in the light of the requirements of your
>> scenarios?
>> - a too great effort needed to move legacy data to linked data,
>> considering the resources you had?
>> - the lack of legacy data--you just converted all what you had?
>> Cheers,
>> Antoine
>> [1]
>> [2]
>> [3]

Received on Sunday, 19 September 2010 18:22:37 UTC