Re: AW: Question on press use cases from Antoine Isaac on 2010-09-19 (public-lld@w3.org from September 2010)

From: Antoine Isaac <aisaac@few.vu.nl>
Date: Sun, 19 Sep 2010 20:22:00 +0200
To: Karen Coyle <kcoyle@kcoyle.net>
CC: public-lld <public-lld@w3.org>
Message-ID: <4C9654C8.5090905@few.vu.nl>
Good points, Karen.

By the way there is one project on the topic, http://blogs.ukoln.ac.uk/locah/.
In Europeana we also have some efforts on converting EAD (one archive standard) data to RDF, but as linked data is not (yet?) the focus of these I will bother you with it only if you're interested ;-)

Cheers,

Antoine


> Joachim,
>
> What you describe is fairly common in archives, although many of them
> have even less information: all they know is that they have someone's
> papers, and the number of boxes they occupy. So this kind of case is a
> good case study for all of that kind of archival material. The exciting
> thing about archives is that they often have material that does not
> exist anywhere else, so their resources are very valuable, but extremely
> hard to find. It would be great to show that linked data can help make
> archives more visible.
>
> kc
>
> Quoting Neubert Joachim <J.Neubert@zbw.eu>:
>
>> Hi Antoine,
>>
>> thank you for asking your question again, and sorry for don't getting
>> you in the telecon. I'll happily try to explain the strange beast
>> press archives. (For me, working for more than 20 years with press
>> archives, it's maybe too familiar and self-evident - while RDA and
>> FRBR are sometimes still looking quite strange to me ...)
>>
>> I'm talking here about a classical newspaper clippings archive like
>> 20th Century Press Archives. This is different from Eds use case,
>> which deals with complete issues and pages of the newspaper. In a
>> clippings archives, you find large numbers of single clippings pasted
>> on loose sheets of paper, with the publication date and the newspaper
>> title scribed or stamped onto the sheet. These sheets (and
>> ocassionally other material, like annual reports of companies) were
>> collected in thematic folders, year over year, normally putting the
>> most current clipping on top of the pile. In the past, there was no
>> possibility to access clippings by author, by title, or by any other
>> attribute. The folders were arranged simply alphabetically (wherever
>> possible), or in some kind of classification (we'll have to delve into
>> this for the subject and wares archives). Generally, there exists no
>> card catalog, and there is no such concept as a "bibliographic unit".
>> It would have been much too expensive to capture for every single
>> clipping, and it wasn't applied to folders either. What mattered was
>> the collection, and within the collection the order of folders on the
>> shelf and the order of clippings within the folder. (That's why
>> OAI-ORE with its quite generic concept of an aggregation looked like a
>> natual fit to me here.)
>>
>> Anyway, legacy metadata is almost non-existant. For some 20,000
>> clippings we have additional metadata now, but this was transcribed
>> from the sheets in the process of digitizing the material, and I doubt
>> that it's affordable to add much more. Maybe users could someday add
>> newspaper titles and publication dates for clippings they actually
>> work with - but this will remain to be very sparse, regarding the size
>> of the complete archives with its 30 million documents.
>>
>> The good news is that we *can* apply metadata to the folder level. We
>> did this for the personal name authority identifier, and can therefore
>> pull in data from there and - via a DBpedia mapping - from the Linked
>> Data Cloud. Doing the same for companies would be great. And the other
>> way arround, we add a lot to the Cloud: These thematic folders are a
>> unique source of historical knowledge and contemporary points of view
>> about almost every issue that was discussed publicly in the 20th century.
>>
>> In library land this kind of material will remain a very special case.
>> But it is part of our cultural heritage, so I think we have to make it
>> accessible with the best methods we can figure out.
>>
>> Cheers, Joachim
>>
>> -----Ursprüngliche Nachricht-----
>> Von: public-lld-request@w3.org [mailto:public-lld-request@w3.org] Im
>> Auftrag von Antoine Isaac
>> Gesendet: Samstag, 18. September 2010 16:42
>> An: public-lld
>> Betreff: Question on press use cases
>>
>> Hi Ed, Joachim,
>>
>> I'm posting the question on your two use cases [1,2] I could not
>> really ask in last week's telecon [3].
>>
>> The data that is published in your cases is pretty much semantic
>> web-oriented, mostly looking at the vocabularies you use: DC, OAI-ORE,
>> FOAF, EXIF, BIBO. There's some RDA/FRBR at [1] but not much. And [2]
>> links to METS records, but rather as a side resource, not a true
>> linked data description.
>>
>> I'm myself pretty happy with that situation--I trust this can be
>> really useful data as such already. But with my LLD hat on I'd like to
>> know more ;-)
>>
>> So the question is whether the current situation results rather from:
>> - a conscious choice of ignoring part of the legacy data you had in
>> the original data sources, in the light of the requirements of your
>> scenarios?
>> - a too great effort needed to move legacy data to linked data,
>> considering the resources you had?
>> - the lack of legacy data--you just converted all what you had?
>>
>> Cheers,
>>
>> Antoine
>>
>> [1]
>> http://www.w3.org/2005/Incubator/lld/wiki/Use_Case_Publishing_20th_Century_Press_Archives
>>
>> [2] http://www.w3.org/2005/Incubator/lld/wiki/Use_Case_NDNP
>> [3]
>> http://www.w3.org/2005/Incubator/lld/minutes/2010/09/16-lld-minutes.html
>>
>>
>>
>
>
>
Received on Sunday, 19 September 2010 18:22:37 UTC