RE: [tt-af-1-0-req] Some (late) comments on the requirements from Johnb@screen.subtitling.com on 2004-01-19 (public-tt@w3.org from January 2004)

From: <Johnb@screen.subtitling.com>
Date: Mon, 19 Jan 2004 14:56:09 -0000
To: bert@w3.org
Cc: public-tt@w3.org
Message-ID: <11E58A66B922D511AFB600A0244A722E9EE6ED@NTMAIL>
Bert, (et al)

I read your comments with some interest, having just re-read the
requirements
document myself following the recent posting of minutes on the list.

I would like to add my observations to yours, in anticipation of feedback
from the WG for TT AF of what is intended by certain statements in the
requirements document. 

Bert Bos wrote: (on 15 January 2004 19:10)

> Based on:
> 
>     http://www.w3.org/TR/2003/WD-tt-af-1-0-req-20030915
> 
> * 1.2 System model
> 
> How about a model of the timed text itself? processing, timing,
> structure.
> 
> * S0000
> 
> What does captioning need, precisely? Color, fonts, font size,
> indents, bullets, images, positioning, timing, font styles,
> underlining, blinking, text shadow, background, transparency,
> sections/groups, repeating blocks, tabulation, right alignment,
> centering, vertical text, real-time authoring...
> 
> * S001
> 
> Ditto

This is not strictly a comment....
I work in the subtitling / captioning industry (sse sig) so am perhaps
vaguely qualified to answer this......

Captioning (or subtitling) audio involves:

Text content.

Timing accurate to frame/field. (including synchronisation to video frames)
Note this is not the same as duration or offset from start - because
captioned
video material may be discontinuous (e.g. Ad breaks)

Basic colours (for text - 16 colour model is sufficient)

Font selection (Fonts for captions are quite restricted due to resolution
and interlace issues

Background colour and box styles (boxing refers to how the background fits
around the text)
E.g. A 'stripe' is a horizontal background across entire display, 'box' is
background
starting just before first character on a line and ending just after last
character.
Other styles are possible e.g. 'word box'. The 'leading' between lines of a
subtitle may
or may not be filled with background.

Italics selection (e.g. italics are used to represent lyrics, other stress
or intonation)

Typically underlining and blinking are NOT used.

Outline and shadow effects on glyphs are used, but outline is typ. a black
border
around a coloured(filled) glyph, and serves to accentuate the glyph against
a varying
video background (i.e. used when background is transparent)

Transparency is used for background. I have never seen reversed text (i.e.
background solid
with transparent text).

Positioning can be quite complex. Captions can steal into the safe area...
often non speech
characters (speaker change marks '-' or music marks '#' will be positioned
outside of the
'safe area', this gives more space for the caption text.
Captions may be centred, left or right aligned, and this may vary from
caption to caption
often to match the speakers on-screen position.

Vertical text has yet more rigorous demands, but is typically produced as
graphics that are burnt
over video.

Finally - although a distinction is (rightly) made concerning captions and
subtitles, in terms of
the system requirments for their display there is very little difference.
Captions may use more 
features for display than subtitles, as captions carry non speech
information as text and this
may be rendered using colour or styles that would not normally be used for
speech related text display.
 
> * S002
> Probably needs speech generation support, such as CSS audio properties
> or another transformation to SSML.

This is true if TTAF is used as a source for the generation of an audio
output.

> * S003
> 
> Is this also intended to be usable to do "marquee" in HTML (embedded
> in an OBJECT or IMG element)?

My view is that TTAF is intended as a format for the authorial intent of
text over time.
How TTAF is handled when it is embedded in other protocols/formats is
probably outside
the scope of TTAF, but I guess if it is considered a likely scenario it
should have
some bearing on how TTAF is specified.
 
> * R100
> 
> What is meant by "authored using XSL"? Does that mean the TT AF can be
> the result of a transformation from some other XML format? In that
> case, why insist on XSL, why not Perl, e.g.?

I read this as meaning 'The TTAF specification is written in XML / XSL.

> * R101 - R103
> 
> One hopes that the TT AF is simple enough to not need modules or
> optional parts...

Hear Hear!

> * R106
> 
> This seems to say that the TT AF should not contain functions that
> serve no purpose, but it says it in a rather verbose way. 
> Unless I misunderstand, this seems rather obvious...
> 
> * R110
> 
> What is an "idealized" streamable format?
> 
> * R112
> 
> The task of the TT WG is to define a TT AF (and probably a TT
> format), not to define the editor to write that format with. 
> (Unless you make a case why you need to do this, and probably 
> update the group's charter as well.)

Hopefully this means that TTAF will be specified such that certain
accessibility issues will be
mandatory within a TTAF document rather than optional. (See R217/218)

> * R204
> 
> This requirement only restricts the element and attribute names of the
> TT AF to ASCII, since R100 (use of XML) already ensured that all text
> content can be written in ASCII. So why not say explicitly 
> that this item is about element and attribute names?

I read this as meaning that any character can be represented, but by using
only the ASCII
characters for that representation. E.g. Cyrillic characters may be edited
into TTAF by typing in
a Unicode codepoint in an ASCII form.
However - I don't read this as meaning that this method is the only form of
representation 
for characters not in the ASCII set !

> * R209
> 
> This makes sense, but some motivation would be good. How about
> headings and lists?

> * R217, R218
> 
> "Embedded" means "in the same file"? Such as a data URL? Or is it an
> external image intended to be displayed simultaneously, while
> "non-embedded" means "intended as hyperlink"?
> 
> If the former, is it also permitted to have the TT AF and the image
> together in a file of a third type, such as a "jar" file? If so, is it
> OK if that third format is a generic archive format, or should it have
> a MIME type that indicates that this is an archive used as TT AF
> (though structurally equal to a generic format)?
 
> * R219, R220
> 
> Not by inventing a new font format, I hope...
> 
> Any idea yet whether there will be a one or more required font formats
> (TrueType, SVG) or is it OK when a UA supports at least one font
> format, even if it is the only UA to know that format?

I personally would like to see adoption of CSS font selection. Ultimately
the display
font used for presentation will depend upon the UA (since all fonts may not
be present
at the UA). Consequently IMHO it is more important to convey the authors
intent wrt the
font used, than the actual font used. That said, it may be important in
certain cases
for TTAF instances to carry a font (or at least glyphs) as bitmaps or
vectors etc for
specific usages of TTAF e.g. company logos (This might use SVG?)

> * R221
> 
> The sentence is hard to read or maybe even ambiguous. What does
> "appropriate domain of discourse" mean? Is it a modifier of "text
> content" or of "descriptive information"? Is the idea that
> you can embed a TEI file in the TT AF?

I interpreted this as 'an appropriate meta dictionary for describing what
the text is'
E.g. stage direction - or - dialogue etc. EIA 708 also contains such
descriptive categories.

> * R222
> 
> This sounds rather ambitious. I thought TT was a mono-media component,
> to be used, e.g., inside SMIL, not a SMIL-replacement.

I agree. I would personally prefer to drop audio. Audio description as
source text for
re-speaking (by human or machine) would still be TTAF.

> * R223
> 
> What does "non-embedded" mean? Does it mean that there is no link to
> the audio in the TT AF itself, but the link is somehow somewhere else
> (such as in a style sheet)? Or, which is maybe the same thing, that
> the TT AF only expresses that there is to be audio of a certain kind
> (e.g., via high-level keywords, such as "alert," "warning" and
> "error"), without pointing to actual sound files?

> * R292, R293
> 
> No objection to using XLink, XML Schemas or Relax NG, but why is it a
> *requirement* to use them? Why not just an intention? What breaks if
> you use something else?
 
> * R300
> 
> R301 seems to be a more precise statement of R300. It seems that R300
> can be removed.
 
> * R301
> 
> Why do you need attributes on elements for the TT AF? Attributes seem
> redundant, when you also have external styles and even physically
> embedded styles. There is nothing you can do with attributes that you
> cannot also do with style sheets, but style sheets can do more. 
> 
> The two reasons I can think of for allowing attributes are (1) ease of
> hand authoring for quick & dirty projects (a rather weak argument) and
> (2) ease of processing, since no memory is required to store style
> sheets (but that doesn't hold here, because style sheets have to be
> supported anyway).
> 
> Maybe this was intended as a requirement for the TT DF instead?

I think one aspect of TTAF may be that it is not primarily content for
direct display,
as for example HTML is. Rather TTAF is an XML standard for conveying text
information,
together with styling and timing that apply to that text, between clients
that will
manipulate that information. Consequently, there is no requirement that the
ordering
of any text within the TTAF document matches the ordering (temporal and or
physical)
of the subsequent presentation of that text content.

So in TTAF you might have the following....

<doc>
"This is displayed last (in time) at the bottom of the screen."
"This is displayed first (in time) at the middle of the screen."
"This is displayed between (in time) at the foot of the screen."
</doc>

If you only had style sheets it might be more? difficult to do this?

I kinda think of a style sheet as something that is used to dress up a
document for display
- without radically altering the basic structure of the document, whereas I
see
style expressed as attributes as being bound more tightly to the context....
YMMV.
 
> * R305
> 
> It might be good to refer to SSML and the upcoming CSS speech module,
> since the aural properties of CSS2 will be deprecated (in CSS 2.1) and
> there will be a new set of properties in CSS3, compatible with SSML.
> They should be very similar to the old ones, but not exactly the same.

> * R307
> 
> Not sure if I interpret this correctly. Is this like scrolling text,
> like a "marquee"?

I was the proponent of the temporal styling concept.
I could send you the original examples and comments if you wish 
- they should be upstream somewhere...
 
> * R390
> 
> See R301. It seems to me that hard-coded styles should be avoided
> where possible and only allowed in final-form formats, like a TT DF.
> (The principle of separation of structure and style is a relative
> principle, but it seems to me that it should hold for the TT AF.)
 
> * R391
> 
> It's a good principle to use existing names and definitions where
> possible, but don't deprive yourself of the possibility to use names
> that fit better with the particular model or syntax that you develop.

>   Bert Bos                                ( W 3 C ) http://www.w3.org/
>   http://www.w3.org/people/bos/                              W3C/ERCIM
>   bert@w3.org                             2004 Rt des Lucioles / BP 93
>   +33 (0)4 92 38 76 92            06902 Sophia Antipolis Cedex, France

regards

John Birch
Senior Software Engineer
Screen Subtitling Systems  
The Old Rectory, Church Lane
Claydon, Ipswich, Suffolk
IP6 OEQ
 
Tel: +44 1473 831700
Fax:+44 1473 830078
www.screen.subtitling.com 
World Class Subtitling Solutions
See us at Cabsat Dubai 8-10th February 2004 Stand No. S6-9

This message is intended only for the use of the person(s) ("the Intended
Recipient") to whom it is addressed. It may contain information which is
privileged and confidential within the meaning of the applicable law.
Accordingly any dissemination, distribution, copying or other use of this
message or any of its content by any person other than the Intended
Recipient may constitute a breach of civil or criminal law and is strictly
prohibited. If you are not the Intended Recipient please destroy this email
and contact the sender as soon as possible.

In messages of non-business nature, the views and opinions expressed are the
author's own and do not necessarily reflect the views and opinions of Screen
Subtitling Systems Limited.

Whilst all efforts are made to safeguard Inbound and Outbound emails, we
cannot guarantee that attachments are Virus-free or compatible with your
systems and do not accept any liability in respect of viruses or computer
problems experienced.
Received on Monday, 19 January 2004 09:59:00 UTC