Re: AW: Question on press use cases from Karen Coyle on 2010-09-19 (public-lld@w3.org from September 2010)

From: Karen Coyle <kcoyle@kcoyle.net>
Date: Sun, 19 Sep 2010 07:49:38 -0700
To: Neubert Joachim <J.Neubert@zbw.eu>
Cc: Antoine Isaac <aisaac@few.vu.nl>, public-lld <public-lld@w3.org>
Message-ID: <20100919074938.5dy3g252m0wggko8@kcoyle.net>
Joachim,

What you describe is fairly common in archives, although many of them  
have even less information: all they know is that they have someone's  
papers, and the number of boxes they occupy. So this kind of case is a  
good case study for all of that kind of archival material. The  
exciting thing about archives is that they often have material that  
does not exist anywhere else, so their resources are very valuable,  
but extremely hard to find. It would be great to show that linked data  
can help make archives more visible.

kc

Quoting Neubert Joachim <J.Neubert@zbw.eu>:

> Hi Antoine,
>
> thank you for asking your question again, and sorry for don't   
> getting you in the telecon. I'll happily try to explain the strange   
> beast press archives. (For me, working for more than 20 years with   
> press archives, it's maybe too familiar and self-evident - while RDA  
>  and FRBR are sometimes still looking quite strange to me ...)
>
> I'm talking here about a classical newspaper clippings archive like   
> 20th Century Press Archives. This is different from Eds use case,   
> which deals with complete issues and pages of the newspaper. In a   
> clippings archives, you find large numbers of single clippings   
> pasted on loose sheets of paper, with the publication date and the   
> newspaper title scribed or stamped onto the sheet. These sheets (and  
>  ocassionally other material, like annual reports of companies) were  
>  collected in thematic folders, year over year, normally putting the  
>  most current clipping on top of the pile. In the past, there was no  
>  possibility to access clippings by author, by title, or by any  
> other  attribute. The folders were arranged simply alphabetically  
> (wherever  possible), or in some kind of classification (we'll have  
> to delve  into this for the subject and wares archives). Generally,  
> there  exists no card catalog, and there is no such concept as a   
> "bibliographic unit". It would have been much too expensive to   
> capture for every single clipping, and it wasn't applied to folders   
> either. What mattered was the collection, and within the collection   
> the order of folders on the shelf and the order of clippings within   
> the folder. (That's why OAI-ORE with its quite generic concept of an  
>  aggregation looked like a natual fit to me here.)
>
> Anyway, legacy metadata is almost non-existant. For some 20,000   
> clippings we have additional metadata now, but this was transcribed   
> from the sheets in the process of digitizing the material, and I   
> doubt that it's affordable to add much more. Maybe users could   
> someday add newspaper titles and publication dates for clippings   
> they actually work with - but this will remain to be very sparse,   
> regarding the size of the complete archives with its 30 million   
> documents.
>
> The good news is that we *can* apply metadata to the folder level.   
> We did this for the personal name authority identifier, and can   
> therefore  pull in data from there and - via a DBpedia mapping -   
> from the Linked Data Cloud. Doing the same for companies would be   
> great. And the other way arround, we add a lot to the Cloud: These   
> thematic folders are a unique source of historical knowledge and   
> contemporary points of view about almost every issue that was   
> discussed publicly in the 20th century.
>
> In library land this kind of material will remain a very special   
> case. But it is part of our cultural heritage, so I think we have to  
>  make it accessible with the best methods we can figure out.
>
> Cheers, Joachim
>
> -----Ursprüngliche Nachricht-----
> Von: public-lld-request@w3.org [mailto:public-lld-request@w3.org] Im  
>  Auftrag von Antoine Isaac
> Gesendet: Samstag, 18. September 2010 16:42
> An: public-lld
> Betreff: Question on press use cases
>
> Hi Ed, Joachim,
>
> I'm posting the question on your two use cases [1,2] I could not   
> really ask in last week's telecon [3].
>
> The data that is published in your cases is pretty much semantic   
> web-oriented, mostly looking at the vocabularies you use: DC,   
> OAI-ORE, FOAF, EXIF, BIBO. There's some RDA/FRBR at [1] but not   
> much. And [2] links to METS records, but rather as a side resource,   
> not a true linked data description.
>
> I'm myself pretty happy with that situation--I trust this can be   
> really useful data as such already. But with my LLD hat on I'd like   
> to know more ;-)
>
> So the question is whether the current situation results rather from:
> - a conscious choice of ignoring part of the legacy data you had in   
> the original data sources, in the light of the requirements of your   
> scenarios?
> - a too great effort needed to move legacy data to linked data,   
> considering the resources you had?
> - the lack of legacy data--you just converted all what you had?
>
> Cheers,
>
> Antoine
>
> [1]   
> http://www.w3.org/2005/Incubator/lld/wiki/Use_Case_Publishing_20th_Century_Press_Archives
> [2] http://www.w3.org/2005/Incubator/lld/wiki/Use_Case_NDNP
> [3] http://www.w3.org/2005/Incubator/lld/minutes/2010/09/16-lld-minutes.html
>
>
>



-- 
Karen Coyle
kcoyle@kcoyle.net http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet
Received on Sunday, 19 September 2010 14:50:17 UTC