Re: timed-text update and further discussion from Al Gilman on 2002-02-20 (www-tt-tf@w3.org from February 2002)

From: Al Gilman <asgilman@iamdigex.net>
Date: Wed, 20 Feb 2002 09:40:21 -0500
To: <www-tt-tf@w3.org>
Cc: duerst@w3.org
Message-Id: <200202201440.JAA52582@smtp1.mail.iamworld.net>
[I have cropped the distribution.  Martin, feel free to forward if you believe
that i18n needs this information. - Al]

At 04:30 AM 2002-02-20 , Martin Duerst wrote:
>Hello Geoff,
>
>Here are requirements for a timed text format from an
>internationalization/localization point of view:
>
>- Make sure character encoding issues,... are cleanly solved
>   (this is okay if the format is XML-based).
>- Address basic bidirectional issues (e.g. use the HTML solution).
>- Allow ruby markup (used a lot on Japanese television, in particular
>   for children and programs for people with hearing challenges).
>   See <http://www.w3.org/TR/ruby>http://www.w3.org/TR/ruby, probably simple
ruby markup could be
>   enough.
>- Mark up the language of the text (i.e. xml:lang).
>- Allow extensibility for similar cases as the three above.

What follows is a rapid runthrough of some practices that roughly define my
current running estimate of an application profile I call "accessible text." 
These ideas should go in the hopper for consideration as regards "what [range
of] text profile[s] are we are adding time to?"

-- begin scenario

The overall plan takes a dictionary, I think.  Rough runthrough follows.

This is "spell checking on steroids" as required to produce accessible texts.

A regexp produces 'lexemes' that is to say tokens that match a given lexical
production.

The regexp is saved for explanation to recipients of the markup that results in
the end.

The tokens are checked against a) intra-document symbol definitions: sometimes
these are called symbols, sometimes, abbreviations or acronyms, and somtimes
they are just Terms of Art that are known as 'glossary entries.'  Two-lexeme
glossary entries are I suppose a special case that will need to be checked for
in another way than simply string-matching lexemes.

In any case, if there is an intra-doc definition for this lexeme, good. 
However you are handling the dictionary stuff vs. the body text, check that the
rules are followed [e.g. first occurrence in some form, or whatever...] and we
are done.

If no intra-document definition, check [with a given lexical database.  save
Dublin Core identification of that database...] to see of the lexeme is a term
in the common vernacular.

[Reading level rules go in here, if you are using them.  Some texts will treat
terms above a given reading level treshold as if Terms of Art and include their
definitions in the intra-docu glossary.]

If a common term "in the dictionary."  we are done.

If all those fail, query human author "what is this?" with options to make it a
typo, a Term of Art by putting in a Glossary definition, etc.

There is some markup we need in all this.

The point is that the glossary-assisted reading may _not_ key off explicit XML
marks but rather presume that the assistive tech is able to lex the text into
lexemes the way we did above, and just have semantic markings on the glossary
to identify it as that.

But there may be markup/metadata for pronunciation, for explanation, for "keep
together" when two or more lexemes are a single token [currently indicated with
the infix &nbsp; operator which if XML religion is followed should be retired
in favor of a <term class="noBreak"> mark.

-- end scenario

Al

>
>- Allow for parallel languages, in different documents or in the
>   same document (e.g. SMIL/SVG <switch>). Allow different languages
>   to cut the text differently where necessary (with different timing).
>- Allow for mixed-language text and parallel texts in more than one
>   different language.
>
>I have copied the I18N IG, maybe they have other ideas.
>
>Regards,    Martin.
>
>At 16:59 02/02/19 -0500, geoff freed wrote:
>
>>After a period of inactivity, it's time to move the timed-text format 
>>discussion along.  To this end, I would like to invite people to send any 
>>remaining timed-text requirements to the list.  These requirements will be 
>>added to the existing document, available at
>>
>><http://www.w3.org/AudioVideo/Group/timetext.html>http://www.w3.org/Audio
Video/Group/timetext.html
>>
>>The W3C has not yet granted us working-group status, but gathering 
>>requirements at this stage will save us time later, when (assuming) the 
>>W3C gives us the go-ahead to begin the real work.  All requests will be 
>>added to the document referenced above.  Since we're primarily interested 
>>in gathering everyone's ideas, we won't yet be wrestling with what is vs. 
>>what is not a valid request or requirement.  That sort of fun will begin
later.
>>
>>In addition, the Web Accessibility Initiative's Protocols and Formats 
>>Working Group (WAI/PF), which is currently hosting the timed-text 
>>discussion, will be holding a meeting in Boston, MA, on April 8 and 9, 
>>2002.  We'd like to have a teleconference with all timed-text participants 
>>during that time, if possible.  The teleconference should last 
>>approximately two hours.  Are people available those days?  Please send me 
>>a note off-list (geoff_freed@wgbh.org) to let me know your availability, 
>>and I'll coordinate a date and time.
>>
>>Finally, for the purpose of creating a baseline, next week I will post a 
>>document detailing what formats already exist for displaying timed text.
>>
>>Thanks.
>>Geoff Freed
>>CPB/WGBH National Center for Accessible Media (NCAM)
>>WGBH Educational Foundation
>>geoff_freed@wgbh.org
>
Received on Wednesday, 20 February 2002 09:40:27 UTC