- From: Al Gilman <asgilman@iamdigex.net>
- Date: Wed, 20 Feb 2002 09:40:21 -0500
- To: <www-tt-tf@w3.org>
- Cc: duerst@w3.org
[I have cropped the distribution. Martin, feel free to forward if you believe that i18n needs this information. - Al] At 04:30 AM 2002-02-20 , Martin Duerst wrote: >Hello Geoff, > >Here are requirements for a timed text format from an >internationalization/localization point of view: > >- Make sure character encoding issues,... are cleanly solved > (this is okay if the format is XML-based). >- Address basic bidirectional issues (e.g. use the HTML solution). >- Allow ruby markup (used a lot on Japanese television, in particular > for children and programs for people with hearing challenges). > See <http://www.w3.org/TR/ruby>http://www.w3.org/TR/ruby, probably simple ruby markup could be > enough. >- Mark up the language of the text (i.e. xml:lang). >- Allow extensibility for similar cases as the three above. What follows is a rapid runthrough of some practices that roughly define my current running estimate of an application profile I call "accessible text." These ideas should go in the hopper for consideration as regards "what [range of] text profile[s] are we are adding time to?" -- begin scenario The overall plan takes a dictionary, I think. Rough runthrough follows. This is "spell checking on steroids" as required to produce accessible texts. A regexp produces 'lexemes' that is to say tokens that match a given lexical production. The regexp is saved for explanation to recipients of the markup that results in the end. The tokens are checked against a) intra-document symbol definitions: sometimes these are called symbols, sometimes, abbreviations or acronyms, and somtimes they are just Terms of Art that are known as 'glossary entries.' Two-lexeme glossary entries are I suppose a special case that will need to be checked for in another way than simply string-matching lexemes. In any case, if there is an intra-doc definition for this lexeme, good. However you are handling the dictionary stuff vs. the body text, check that the rules are followed [e.g. first occurrence in some form, or whatever...] and we are done. If no intra-document definition, check [with a given lexical database. save Dublin Core identification of that database...] to see of the lexeme is a term in the common vernacular. [Reading level rules go in here, if you are using them. Some texts will treat terms above a given reading level treshold as if Terms of Art and include their definitions in the intra-docu glossary.] If a common term "in the dictionary." we are done. If all those fail, query human author "what is this?" with options to make it a typo, a Term of Art by putting in a Glossary definition, etc. There is some markup we need in all this. The point is that the glossary-assisted reading may _not_ key off explicit XML marks but rather presume that the assistive tech is able to lex the text into lexemes the way we did above, and just have semantic markings on the glossary to identify it as that. But there may be markup/metadata for pronunciation, for explanation, for "keep together" when two or more lexemes are a single token [currently indicated with the infix operator which if XML religion is followed should be retired in favor of a <term class="noBreak"> mark. -- end scenario Al > >- Allow for parallel languages, in different documents or in the > same document (e.g. SMIL/SVG <switch>). Allow different languages > to cut the text differently where necessary (with different timing). >- Allow for mixed-language text and parallel texts in more than one > different language. > >I have copied the I18N IG, maybe they have other ideas. > >Regards, Martin. > >At 16:59 02/02/19 -0500, geoff freed wrote: > >>After a period of inactivity, it's time to move the timed-text format >>discussion along. To this end, I would like to invite people to send any >>remaining timed-text requirements to the list. These requirements will be >>added to the existing document, available at >> >><http://www.w3.org/AudioVideo/Group/timetext.html>http://www.w3.org/Audio Video/Group/timetext.html >> >>The W3C has not yet granted us working-group status, but gathering >>requirements at this stage will save us time later, when (assuming) the >>W3C gives us the go-ahead to begin the real work. All requests will be >>added to the document referenced above. Since we're primarily interested >>in gathering everyone's ideas, we won't yet be wrestling with what is vs. >>what is not a valid request or requirement. That sort of fun will begin later. >> >>In addition, the Web Accessibility Initiative's Protocols and Formats >>Working Group (WAI/PF), which is currently hosting the timed-text >>discussion, will be holding a meeting in Boston, MA, on April 8 and 9, >>2002. We'd like to have a teleconference with all timed-text participants >>during that time, if possible. The teleconference should last >>approximately two hours. Are people available those days? Please send me >>a note off-list (geoff_freed@wgbh.org) to let me know your availability, >>and I'll coordinate a date and time. >> >>Finally, for the purpose of creating a baseline, next week I will post a >>document detailing what formats already exist for displaying timed text. >> >>Thanks. >>Geoff Freed >>CPB/WGBH National Center for Accessible Media (NCAM) >>WGBH Educational Foundation >>geoff_freed@wgbh.org >
Received on Wednesday, 20 February 2002 09:40:27 UTC