- From: William Waites <william.waites@okfn.org>
- Date: Sun, 19 Sep 2010 17:08:38 +0100
- To: Antoine Isaac <aisaac@few.vu.nl>
- CC: public-xg-lld <public-xg-lld@w3.org>, David Smith <david@okapi.cc>, steven@okapi.cc
Filled in wearing my Okapi Consulting hat. > ================================================================ > ================================================================ > > === Name === Radio Station Archive Digitisation > === Owner === Steven Morris <steven@okapi.cc> David Smith <david@okapi.cc> William Waites <ww@okapi.cc> > === Background and Current Practice === Many radio stations have archives of audio programming going back many years. In many cases they are not digitised and have little or inconsistent metadata. From time to time there are efforts to take these degrading tapes and other media and digitise them, a process that necessarily also means creation and elaboration of the associated metadata. Current practice for metadata creation and transcription is often ad-hoc, conforming in various degrees to established library methods. > === Goal === (1) To have permanently accessible copies of archive audio recordings, indexed and searchable (2) To enable cross references to other events (particularly valuable where the audio in question is a news broadcast) and to enable federated searching both on these cross references and generally. > === Target Audience === Scholars, journalists, diaspora communities, the general public. > === Use Case Scenario === An expert arrives after the necessary arrangements and introductions at the Radio Mogadishu archives. They are to work with the elderly archivist to create digital copies of the archives which are stored 1/4 inch tape. The tapes are in varying states of readability, some quite good, others having degraded beyond recovery. The archivist has a catalogue or index system but much information about the tapes and their content is in his memory. Over the ensuing period, the expert painstakingly creates digital copies of the tapes insofar as that is possible. As this is happening, they annotate the copies with such information as is available from the catalogue system, the archivist, and a native Somali speaker listening to the tapes should the expert not speak the language. The resulting digital copies and their metadata (including any metadata available for media that was not itself recoverable) are then copied to a computer system for local use as well as transported to a library in Europe for further dissemination. > === Application of linked data for the given use case === The richer the information that can be collected during the already painstaking digitisation process, the better. Not only would it be a fascinating learning experience as the expert will no doubt learn much about the local situation and its history, and the boredom of mechanically digitising the content would be alleviated, but the results of the work would then be immediately available for linking to related resources. Whilst reading an article online about the events in a particular place, one could listen to contemporary local radio, for example. The first benefit is the multiplication of the potential impact of the effort that is put into the process. Because of the close attention that can be paid whilst interviewing the archivist and listening to the tapes, information can be gathered at a deeper level than could easily be encoded according to the basic set of metadata that might include dates of broadcast and digitisation, some subject information, etc.. The extensibility of RDF, where new predicates can be defined when and as needed. A simple example might be do define a sub property of dc:identifier for the Radio Mogadishu Archives tape labelling scheme. A more interesting example might be to annotate a recording with information about the participants, is it an interview? Who is the interviewer? The interviewee? The station director at the time? Some of this may require minting URIs for the people involved and adding information about them. The incredibly valuable opportunity to do historical research can quickly grow beyond the scope of merely creating a catalogue of recordings. The second benefit is therefore that more information can be collected and represented than would normally be the case in this type of endeavour and a valuable opportunity is not wasted. > === Problems and Limitations (optional) === * Guidance for creating basic metadata about audio recordings (and copies of such) is lacking or contradictory. Between FRBR and flat Dublin Core there are a range of possibilities. Where do we put information about who did the digitisation? Who transcribed the metadata? When and where do we create Works and Manifestations. These are clearly questions of general applicability to library data, and there is some overlap with the Provenance WG as well. * There is no vocabulary that we are aware of for describing the state of source material. Is a coarse "readable", "partially- readable", "unrecoverable" sufficient? Is it worthwhile to try to represent which sections may be recoverable or not? * There might be questions of confidence. Suppose the label on a tape is difficult to read or itself degraded. Instead of writing, "the tape has label X", one might want to write "the tape's label appears to be either X or Y". "Is that an 8 or a 3?" How do we preserve information about the transcriptioner's uncertainty? * Rich information, propositions about the audio recording and how its participants relate to the world is a much harder modelling problem, perhaps at least partially solved in principle by RDF, but having a framework that supports such things structurally is different from having the vocabularies required and different still from having people *know* which vocabularies are required. -- William Waites <william.waites@okfn.org> Mob: +44 789 798 9965 Open Knowledge Foundation Fax: +44 131 464 4948 Edinburgh, UK RDF Indexing, Clustering and Inferencing in Python http://ordf.org/
Received on Sunday, 19 September 2010 16:10:20 UTC