W3C home > Mailing lists > Public > w3c-wai-ig@w3.org > October to December 2002

Re: collated text transcripts

From: Al Gilman <asgilman@iamdigex.net>
Date: Sun, 24 Nov 2002 15:04:58 -0500
Message-Id: <>
To: Shawn Lawton Henry <shawn@uiaccess.com>
Cc: <w3c-wai-ig@w3.org>

At 08:29 PM 2002-11-23, Shawn Lawton Henry wrote:

>I am researching collated text transcripts (that provide text
>descriptions of visual information within a text transcript of sound for
>streaming video on the web). I have found very little - mostly just
>references in WAI documents and list archives.
>Do you know of any articles, publications, web pages, or such that
>address the issues surrounding collated text transcripts? Or any
>standards or guidelines - for example, guidelines that cover how to
>indicate visual descriptions, such as surrounded in brackets (like
>http://www-3.ibm.com/able/hprvideo.html), or preceded with "Visual
>Description:", or other?

** summary:

What you should do is a cross between best current practice for print
publication of screenplays and playscripts and the Daisy Book example for
virtualized rhetorical structure.  Keeping the principles evident in the XAG
as you merge those two practices.

** details

1.  Models of best practice from the genre in legacy media

The DAISY consortium and NISO Standards committed that developed the ANSI
Z39-86 Digital Talking Book standard have always wanted to produce a module
for the production of scripts for dramatic works, and have had to console
themselves so far with the idea that XML makes modular increments easy and
that the standard as now ordained should be so extensible.  There may be a
drama DTD that was produced under ICADD.

But I think that you do need to think in terms of an eText adaptation of the
best practices for printed scripts for dramatic works -- whether it be stage
plays or screenplays.  To this end one of the references that should be
considered is as follows:

Go to the Chicago Manual of Style, e.g. ISBN 0-226-10389-7, and review all
the paragraphs cited in the index under 'plays.'

2. Operability: e.g. late binding to the scope of collation.

Here I am much assisted by the work done so far on the XML Accessibility

Note how the qualitative conditioning information is handled in the Digital
Talking Book.  In particular the modal treatment of notes.  It is supposed
to be at user option how the note content is presented with respect to the
content of the continuous text flow.  It may be injected in full, concealed
entirely, or the places where notes pertain marked with sonicons and
inspection of the note material made available as a response to user action.

In this way, you should, in the language of the rather abstract XAG
Checkpoint 2.1 "capture ... semantics in repurposable form" that is to say
have the XML encoding of the content reflect what there is to understand
about the relationships involved, and let the specific binding of this model
into a user interface be separately decided depending on the capabilities of
the current delivery context.

In terms of the drama ontology, the voice of the describer is the voice of a
user-option elective addition to the dramatis personae.  The styling rules
are the same for this persona as for the others who always appear and speak:
the styling has to distinguish this personal from the other personae, not
intrude too much on the appreciation of what the persona says, and be in
character to the extent that the persona has a character that is part of the
sense or import of the work.  But in addition there is a mode control bit
that is controllable after the work leaves the author or server that
determines if this stream is to be included or not.

In that sense, a collated transcript is a print style binding for a
multi-voice dialog.  The universal form is not this print binding done in
'accessible hypertext' but rather a Daisy book with the multithreaded
rhetorical structure in fully semanticized encoding tailored to the needs of
screenplay and playscript representation.  The user should be able to select
personae whose utterances to collate and get a final form which is a
collated view of just the dialog between these speakers.  With the
indication of who says what drawn from re-usable styling practices and the
range of who is speaking that gets caught up in the collation
user-determined for just this session-specific representation of the dialog.
  If you have hired or licensed Walter Cronkite or James Earl Jones to be the
voice of the describer in the audio form of the description, this
information should be available because these personae are public figures
and naming them will give an idea of the character that the describer-voice
is supposed to reflect.

3.  See also the recently-abrogated FCC order on audio description for
guidance on what should be in a description track.


Incomplete references:

1.  Chicago Manual of Style

Notes to that last notation, which is a free adaptation of http: scheme
usage for queries: 'book' is an attempt to use a term which is proper to the
entire ISBN scheme and always refers to the root entity identified by an
ISBN identifier.
'index' is what this part/whole part of the book is actually called in the
text of the book.  It is a term lifted by copy from the natural language
binding given by the creator of the book to structures in the book.
'entry' is once again a notion from some universal model for recognizing the
logical structure of indices.  A rough definition is that it is the key or
key tuple by which the entries are sorted in the published list-form view of
the index relation.  There may be a standard term for this in the ISO
metadata repository standards.
'plays' is again a book-specific token, a content reference to the content
of the 'entry' field of the so-keyed list-element information structure in
the document-section index.

2.  Daisy Digital talking book

dc.identifier= "Z39.86-2002"




3.  XML Accessibility Guidelines: <http://www.w3.org/TR/xag>

4  Accessible SMIL: <http://www.w3.org/TR/SMIL-access/>

5. FCC Order:

The original order:

The successful appeal from the order:


PPS:  Usage tip: script vs. transcript.

If it is reasonable to impute a higher authority to the audio or video form
than to the text form, then use 'transcript.'  In recording court
proceedings or meetings, a transcript is captured.  When verifying the text
transcript against an audio recording, the audio recording is regarded as
more authoritative as regards what was actually said.


If it is reasonable to impute a higher authority to the text form than the
audio or video form, then use 'script.'  In a produced radio show, the
actors follow the script.

sense 1 c (2) at

If the precedence with regard to authority is ambiguous or irrelevant, then
one will prefer 'script' where simple language is more important and 
where precise indication of function is more important.

For a movie or TV show, even when collating-in what is said by the
describer, just call it a script.  This best fits the fact that this
multi-voice composite is published or served under the aegis of one
authority, the corporate author for the publishing transaction.

On the other hand use 'transcript' is when the utterances are what was
actually said by one or more autonomous authorities as in a court proceeding
or a meeting.  The transcriber is attempting to construct a faithful capture
of what they chose to say, and there is not unified authority over what they
should say as there was in the previous case.

In this assistive application, the precision of 'transcript' is misleading
as it suggests that style practices for theatrical scripts should somehow
be different.  Presentation practices for collated scripts including 
track scripts are not significantly distinguishable from good practice for
scripts used in media production.

So in the current situations, calling the thing we are talking about a
"collated transcript" is consistent with the prior discourse *in accessibility
circles* but not the highest and best use of terms from first principles.

PPPS:  In search of a Dublin Core for rhetoric...

This kind of harmonizing exegesis is important so we can encode scripts for
media industry publications and transcripts for ad-hoc collaboration
sessions in a rhetorical model with a maximal common core.  [Jukka gave my
that delicious roleName 'harmonizing exegetic' for a major aspect of my
life's work and I have to use it...]


>Thanks much for any pointers or information!
>- Shawn
>Shawn Lawton Henry
>e-mail: Shawn@UIAccess.com
>phone: 608.243.1089
>about: www.uiaccess.com/profile.html
Received on Sunday, 24 November 2002 15:03:56 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 19 July 2011 18:14:07 GMT