Use Case: Radio Station Archive Digitisation

Filled in wearing my Okapi Consulting hat.

> ================================================================
> ================================================================
> 
> === Name ===

Radio Station Archive Digitisation

> === Owner ===

Steven Morris <steven@okapi.cc>
David Smith <david@okapi.cc>
William Waites <ww@okapi.cc>

> === Background and Current Practice ===

Many radio stations have archives of audio programming going back
many years. In many cases they are not digitised and have little
or inconsistent metadata. From time to time there are efforts to
take these degrading tapes and other media and digitise them, a
process that necessarily also means creation and elaboration of
the associated metadata.

Current practice for metadata creation and transcription is often
ad-hoc, conforming in various degrees to established library
methods.

> === Goal ===

(1) To have permanently accessible copies of archive audio recordings,
indexed and searchable
(2) To enable cross references to other events (particularly valuable
where the audio in question is a news broadcast) and to enable
federated searching both on these cross references and generally.

> === Target Audience ===

Scholars, journalists, diaspora communities, the general public.

> === Use Case Scenario ===

An expert arrives after the necessary arrangements and introductions
at the Radio Mogadishu archives. They are to work with the elderly
archivist to create digital copies of the archives which are stored
1/4 inch tape. The tapes are in varying states of readability, some
quite good, others having degraded beyond recovery. The archivist
has a catalogue or index system but much information about the tapes
and their content is in his memory.

Over the ensuing period, the expert painstakingly creates digital
copies of the tapes insofar as that is possible. As this is happening,
they annotate the copies with such information as is available from
the catalogue system, the archivist, and a native Somali speaker
listening to the tapes should the expert not speak the language.

The resulting digital copies and their metadata (including any
metadata available for media that was not itself recoverable) are
then copied to a computer system for local use as well as
transported to a library in Europe for further dissemination.

> === Application of linked data for the given use case ===

The richer the information that can be collected during the already
painstaking digitisation process, the better. Not only would it be
a fascinating learning experience as the expert will no doubt learn
much about the local situation and its history, and the boredom of
mechanically digitising the content would be alleviated, but the
results of the work would then be immediately available for linking
to related resources. Whilst reading an article online about the
events in a particular place, one could listen to contemporary local
radio, for example. The first benefit is the multiplication of
the potential impact of the effort that is put into the process.

Because of the close attention that can be paid whilst interviewing the
archivist and listening to the tapes, information can be gathered
at a deeper level than could easily be encoded according to the
basic set of metadata that might include dates of broadcast and
digitisation, some subject information, etc.. The extensibility of
RDF, where new predicates can be defined when and as needed. A
simple example might be do define a sub property of dc:identifier
for the Radio Mogadishu Archives tape labelling scheme. A more
interesting example might be to annotate a recording with information
about the participants, is it an interview? Who is the interviewer?
The interviewee? The station director at the time? Some of this
may require minting URIs for the people involved and adding
information about them. The incredibly valuable opportunity to
do historical research can quickly grow beyond the scope of merely
creating a catalogue of recordings. The second benefit is therefore
that more information can be collected and represented than would
normally be the case in this type of endeavour and a valuable
opportunity is not wasted.

> === Problems and Limitations (optional) ===

 * Guidance for creating basic metadata about audio recordings
   (and copies of such) is lacking or contradictory. Between FRBR
   and flat Dublin Core there are a range of possibilities. Where
   do we put information about who did the digitisation? Who
   transcribed the metadata? When and where do we create Works and
   Manifestations. These are clearly questions of general applicability
   to library data, and there is some overlap with the Provenance
   WG as well.

 * There is no vocabulary that we are aware of for describing the
   state of source material. Is a coarse "readable", "partially-
   readable", "unrecoverable" sufficient? Is it worthwhile to try
   to represent which sections may be recoverable or not?

 * There might be questions of confidence. Suppose the label on a
   tape is difficult to read or itself degraded. Instead of writing,
   "the tape has label X", one might want to write "the tape's label
   appears to be either X or Y". "Is that an 8 or a 3?" How do we
   preserve information about the transcriptioner's uncertainty?

 * Rich information, propositions about the audio recording and how
   its participants relate to the world is a much harder modelling
   problem, perhaps at least partially solved in principle by RDF,
   but having a framework that supports such things structurally is
   different from having the vocabularies required and different
   still from having people *know* which vocabularies are required.

-- 
William Waites           <william.waites@okfn.org>
Mob: +44 789 798 9965    Open Knowledge Foundation
Fax: +44 131 464 4948                Edinburgh, UK

RDF Indexing, Clustering and Inferencing in Python
		http://ordf.org/

Received on Sunday, 19 September 2010 16:10:20 UTC