words, lines, etc. [was RE: TT and subtitling/captioning - separating...] from Al Gilman on 2003-08-11 (public-tt@w3.org from August 2003)

From: Al Gilman <asgilman@iamdigex.net>
Date: Mon, 11 Aug 2003 15:10:57 -0400
To: public-tt@w3.org
Message-Id: <5.1.0.14.2.20030811135827.022854b0@pop.iamdigex.net>
1. SSML has an interest in tokenization policy[1].  If TT-AF wants to do
selection based on indexing over un-marked tokens the tokenization policies
here and there should be aligned.
[I think TT-AF wants to avoid this one for one round;  I think when the dust
settles on SSML 1.0 we will find Voice has, too.]

2. Lines in poetry should be entified, i.e. marked with appropriate
container elements, and not separated with separators comparable to html:br.
  The Scooby theme song example is poetry, IMHO.

Outside of poetry and similar domains where the line division is proper
content, we do need to have some way to address the post-flow lines in the
presentation space.  The presentation spaces for [subtitles and captions]
are confined enough so that we can't ignore these divisions (as John has said).
I don't have a suggestion or precedent for how to approach this

3. Line-break-suppression is still something that may be considered content.
[Keep-together is a common theme in the content-borne formatting hints in
experience in the device-independent-authoring community.
http://lists.w3.org/Archives/Public/www-archive/2003Jun/0036.html ]
Again, XML container elements are to be preferred over separators.  The
infix &nbsp; character has been so abused in HTML practice as to beg to be
superceded by markup wrappings for proper names and other compound terms
that should resist breaking over lines.

At 12:30 PM 2003-08-11, Johnb@screen.subtitling.com wrote:
>Glenn,
>
><GA>
>We would have to extend XPointer to handle ranges that involve words or 
>lines. At present, ranges demarcate their endpoints using character 
>offsets or element offsets, e.g., from 2nd to 4th character, from before 
>element E to after element F, etc.
>
><JB>
>I'm not sure I see that this is a problem - since we wouldn't be using the 
>spans in this instance...
>
><p id="p1">
>Scooby dooby doo where are you?
>we've got some work to do now
>Scooby dooby doo, where are you?
>we need some help from you now
>come on Scooby doo, I see you
>pretending we've got a slither
>you're not fooling me, cause I can see
>the way you shake and shiver
></p>

This example is poetry, not streaming prose.
So the line divisions are content, not simply presentation.

Appropriate content markup could be
><stanza id="s1">
><line>Scooby dooby doo where are you?</line>
><line>we've got some work to do now</line>
><line>Scooby dooby doo, where are you?</line>
><line>we need some help from you now</line>
><line>come on Scooby doo, I see you</line>
><line>pretending we've got a slither</line>
><line>you're not fooling me, cause I can see</line>
><line>the way you shake and shiver</line>
></stanza>

In that case no extensions on XPath are involved.

Al

[1] In answers to Last Call comments, the SSML Editor has said

<quote
cite="http://lists.w3.org/Archives/Public/www-voice/2003AprJun/0050.html">

[60] Rejected.  This is a tokenization issue.  Tokens in SSML are
delimited both by white space and by SSML elements.  You can write
a word as two separate words and it will have a break, you can
insert an SSML element, or you can use stress marks externally.
For Asian languages with characters without spaces to delimit
words, if you insert SSML elements it automatically creates a
boundary between words.  You can use a similar approach for German,
e.g. with "Fussbalweltmeisterschaft".  If you insert a <break> in
the middle it actually splits the word, but that's probably what
you wanted:  Fussbal<break>weltmeisterschaft.
If you wish to insert prosodic controls, that would be handled
better via an external lexicon which can provide stress markers,
etc.

</quote>

On the other hand, I couldn't find any basis for this assertion in
the Last Call document itself.

<quote
cite="http://www.w3.org/TR/speech-synthesis/#S3.3">

    The W3C Voice Browser Working Group is developing the Pronunciation
    Lexicon Markup Language [LEX]. The specification will address the
    matching process between words and lexicon entries <snip/>

</quote>

PS:  Character offsets may prove problematical unless and until
the early normalization provisions of the Character Model for the
World Wide Web are generally honored in practice.

  http://www.w3.org/TR/charmod/#sec-Normalization

But it probably makes more sense for the TT-AF using community to work
on implementing early uniform normalization per the above than to invent
workarounds to cure the effects of its absence.

>
>so presumably:
>
><cue select="#xpointer(p1/range(1.0, 1.31))" use="a2" dur="1"/>
><cue select="#xpointer(p1/range(1.32, 1.61))" use="a2" dur="1"/>
><cue select="#xpointer(p1/range(1.62, 1.91))" use="a2" dur="1"/>
>
>and so on
>
>  - would select lines from the paragraph.
>But agreed - this is not quite as elegant as a word / line / character / 
>paragraph selector.
>
><GA>
>Doing a line selector is problematic unless it is based strictly on forced 
>line breaks in content, which would require the authoring system to 
>predetermine line breaks and which would not work well if UA or device 
>could change fonts or layout region. It might be possible to introduce a 
>form of "pseudo" selector that makes selection based on units that are not 
>determinable by lexical content. CSS provides such selectors.
>
><JB> I'd definitely prefer to avoid hard line breaks - they would tie the 
>content to a specific layout. Within subtitling / captioning, a line break 
>has a greater significance than within 'bulk text' - since there may be an 
>inferred change of speaker etc.....
>
>I think the issues that arise when the "UA or device could change fonts or 
>layout region" are within what I am calling the "temporal flow " model.
>That is - I see two distinct modes here:
>
>1) an explicit 'knife and forked' model - where content is without inline 
>**style** markup - but perhaps uses some form of line markup to support 
>selection. This mode is 'author driven' i.e. the author is making choices 
>based on expected delivery interfaces.
>
>2) a relaxed model - where content is without **any** inline markup. In 
>this model - the "temporal flow" attributes control how much content is in 
>the region and what happens if it overflows. This supports device 
>independence much more flexibly. So this example might produce pop on 
>subtitles....
>
><style>
>   p { display : none; color: blue; temporal-overflow: auto; add-interval: 
> 1s;  read-interval: 3s; region-full-clear: all; region-fill-mode: all}
></style>
>
><cue select="id(p1)" dur="50s"/>
>
><p id="p1">
>Scooby dooby doo where are you?
>we've got some work to do now
>Scooby dooby doo, where are you?
>we need some help from you now
>come on Scooby doo, I see you
>pretending we've got a slither
>you're not fooling me, cause I can see
>the way you shake and shiver
></p>
>
>regards
>John Birch
>
>The views and opinions expressed are the author's own and do not necessarily
>reflect the views and opinions of Screen Subtitling Systems Limited.
Received on Monday, 11 August 2003 15:32:45 UTC