RE: Timed Text Authoring Format - Distribution Format Exchange Pr ofile (DFXP) Streaming from Sean Hayes on 2005-03-29 (public-tt@w3.org from March 2005)

From: Sean Hayes <shayes@microsoft.com>
Date: Tue, 29 Mar 2005 18:14:51 +0100
To: <Johnb@screen.subtitling.com>
Cc: <gadams@xfsi.com>, <public-tt@w3.org>
Message-ID: <2E8E7EA6DA6DF24F853D296CAB3BB31B01F8F3CE@EUR-MSG-11.europe.corp.microsoft.com>
John, I think you exemplify exactly the kind of person I mean. You read
the spec, pretty carefully in fact, and you made some assumptions that
turned out incorrect - as many reasonable people are likely to do.

 

As you say, probably only a handful of people are ever going to read the
XSD in detail and two of them are on this thread. So if it is wrong,
then it needs to be labeled as such in big letters. If we provide short
cuts to understanding, then in my opinion they need to be right.

 

I'm actually struggling to remember why meta had to come at the start in
any case, especially if there can be more than one. It seems like an
unnecessary restriction to me and just makes life complicated. 

 

I think there are cases where you want to be able to use meta attributes
and cases where you want meta elements, so we do need to support both.

 

________________________________

From: Johnb@screen.subtitling.com [mailto:Johnb@screen.subtitling.com] 
Sent: 29 March 2005 08:58
To: Sean Hayes
Cc: gadams@xfsi.com; public-tt@w3.org
Subject: RE: Timed Text Authoring Format - Distribution Format Exchange
Pr ofile (DFXP) Streaming

 

Sean,

 

It's not that I didn't read it.....I interpreted the spec incorrectly.
When I saw Meta.class in the XML representation I interpreted that as
meaning that the element could take attributes from the metadata
attribute vocabulary.

 

Having just re-read the spec I am now even more unsure as to why you can
include any attributes from the TT:Metadata namespace within most
content elements and also be able to include multiple meta elements?
Would it not be clearer just to allow metadata only in meta elements? or
only as attributes within elements? NOT both?

 

Actually I'd suggest that the spec may be clear to the authors - but
perhaps not so clear to the rest of us mortals :-)

 

Sometimes you need to state things in 'real world' terms - and in the
right places.

 

Remember - most implementors will not be schema gurus - or even XML
lawyers.......

 

I have the benefit of having been involved in some of the discussions,
and of having an idea of some of the ambitions of the WG.

BUT most of the implementors will give the spec a cursory glance and
then implement based on any sample file using DFXP they can find....

You'll be lucky if they even look at the XSD IMO.

 

IMO If some part of the XSD is qualified by normative text outside of
that XSD - there should **at least** be a comment within the XSD to that
effect.

 

John

 

 

 -----Original Message-----
From: Sean Hayes [mailto:shayes@microsoft.com]
Sent: 29 March 2005 17:03
To: Glenn A. Adams; Johnb@screen.subtitling.com
Cc: public-tt@w3.org
Subject: RE: Timed Text Authoring Format - Distribution Format Exchange
Pr ofile (DFXP) Streaming

	Yes I understand that the spec is clear if you read it. My fear
is that in the real world, people aren't going to read the small print
(John has already demonstrated this :-) and really understand TT.  They
are going to load the XSD into XML Spy or some such tool and go
generating thousands of document instances. If they are an influential
player, and produce enough content then they become the de-facto
standard. And we end up with another HTML situation.

	 

	'A stitch in time' as my Mother would say...

	 

	
________________________________


	From: public-tt-request@w3.org [mailto:public-tt-request@w3.org]
On Behalf Of Glenn A. Adams
	Sent: 29 March 2005 07:54
	To: Sean Hayes; Johnb@screen.subtitling.com
	Cc: public-tt@w3.org
	Subject: RE: Timed Text Authoring Format - Distribution Format
Exchange Pr ofile (DFXP) Streaming

	 

	I don't see any reason to make the XSD (or RNC) schemas
informative. Both are normative in the sense that we consider them
formally defined and formally part of the specification. However,
neither are normative in the sense of being the benchmark for
validation. I think we have clearly define validity in a section 3 which
is independent of a particular schema, which was our intent.

	 

	
________________________________


	From: Sean Hayes [mailto:shayes@microsoft.com] 
	Sent: Tuesday, March 29, 2005 10:51 AM
	To: Glenn A. Adams; Johnb@screen.subtitling.com
	Cc: public-tt@w3.org
	Subject: RE: Timed Text Authoring Format - Distribution Format
Exchange Pr ofile (DFXP) Streaming

	 

	Yes I read this. This text is insufficient IMO. If we don't
remove the XSD, then we should at least

	 a) Make it informative

	 b) Put in 48 point red bold text that the XSD schema is known
not to adhere to the normative requirements of DFXP (but we included it
anyway) and that content exchange mechanisms are required to do
additional work over and above just XSD processing if they choose to use
schema for validation.

	 

	Sean.

	
________________________________


	From: Glenn A. Adams [mailto:gadams@xfsi.com] 
	Sent: 29 March 2005 07:43
	To: Sean Hayes; Johnb@screen.subtitling.com
	Cc: public-tt@w3.org
	Subject: RE: Timed Text Authoring Format - Distribution Format
Exchange Pr ofile (DFXP) Streaming

	 

	I don't think it will be practical to remove the XSD schema from
DFXP. Rather, we simply need to qualify the differences regarding
validation. Note that the compliance clause of DFXP is not based upon
using any form of schema validation, so it does not affect compliance.
Note also that we  have the following language under the header of Annex
C:

	 

	"In any case where a schema specified by this appendix differs
from the normative definitions of document type, element type, or
attribute type as defined by the body of this specification, then the
body of this specification takes precedence."

	 

	
________________________________


	From: Sean Hayes [mailto:shayes@microsoft.com] 
	Sent: Tuesday, March 29, 2005 10:39 AM
	To: Sean Hayes; Glenn A. Adams; Johnb@screen.subtitling.com
	Cc: public-tt@w3.org
	Subject: RE: Timed Text Authoring Format - Distribution Format
Exchange Pr ofile (DFXP) Streaming

	 

	Furthermore, since the XSD in the draft contains an incorrect
model, it should be removed. 

	 

	I did think about putting warning language in, but it would get
ignored 'for convenience' and since XSD processing is more prevalent
than RNG processing right, now I'm sure we would end up with incorrect
content floating around.

	 

	
________________________________


	From: public-tt-request@w3.org [mailto:public-tt-request@w3.org]
On Behalf Of Sean Hayes
	Sent: 29 March 2005 07:28
	To: Glenn A. Adams; Johnb@screen.subtitling.com
	Cc: public-tt@w3.org
	Subject: RE: Timed Text Authoring Format - Distribution Format
Exchange Pr ofile (DFXP) Streaming

	 

	OK, if its legal XML, and we are going to keep it, and it can't
be expressed in W3C schema; then I propose we make a TTWG input to this
effect to the upcoming W3C Schema users group meeting in June.

	 

	
________________________________


	From: Glenn A. Adams [mailto:gadams@xfsi.com] 
	Sent: 29 March 2005 07:25
	To: Sean Hayes; Johnb@screen.subtitling.com
	Cc: public-tt@w3.org
	Subject: RE: Timed Text Authoring Format - Distribution Format
Exchange Pr ofile (DFXP) Streaming

	 

	Yes it is legal in XML; however, neither DTD nor XML Schema
supports expression of this constraint. On the other hand, RNG does (and
other schema languages do). In our present case, the normative
definition of content models for compliance testing is based upon the
XML Representation specifications in the body of the specification, and
not upon any specific schema (or schema language).

	 

	
________________________________


	From: Sean Hayes [mailto:shayes@microsoft.com] 
	Sent: Tuesday, March 29, 2005 10:16 AM
	To: Johnb@screen.subtitling.com; Glenn A. Adams
	Cc: public-tt@w3.org
	Subject: RE: Timed Text Authoring Format - Distribution Format
Exchange Pr ofile (DFXP) Streaming

	 

	Actually you can have multiple meta child elements, however this
does bring up an issue I have been meaning to raise. The content model
for <p> is Meta.class*, Animation.class*, (#PCDATA|span|br)*

	 

	However I'm not sure it is legal in XML to restrict PCDATA to
occur only after a certain list of elements. It is not possible to
express this in W3C schema in any case.

	 

	We might want to consider relaxing the <meta> comes first rule.

	 

	Sean

	
________________________________


	From: public-tt-request@w3.org [mailto:public-tt-request@w3.org]
On Behalf Of Johnb@screen.subtitling.com
	Sent: 29 March 2005 07:23
	To: gadams@xfsi.com; Johnb@screen.subtitling.com
	Cc: public-tt@w3.org
	Subject: RE: Timed Text Authoring Format - Distribution Format
Exchange Pr ofile (DFXP) Streaming

	 

	Glenn,

	 

	I hadn't caught that one :-) (Meta data at all levels). Does
this mean that you can put a meta child element under any other element?
Presumably restricted to a single child instance?

	 

	In which case, my remaining concern about creating multiple
language DFXP files is that there is insufficient headroom given the non
nesting of div to cater for the structure I anticipate needing. Why does
div not nest?

	 

	regards John Birch.

		-----Original Message-----
		From: Glenn A. Adams [mailto:gadams@xfsi.com]
		Sent: 29 March 2005 15:53
		To: Johnb@screen.subtitling.com
		Cc: public-tt@w3.org
		Subject: RE: Timed Text Authoring Format - Distribution
Format Exchange Pr ofile (DFXP) Streaming

		Since we don't (and won't) define a DFXP UA, it is up to
whomever defines a UA to determine whether user specified style sheets
or transforms may apply. In general, I don't see why they should not.

		 

		I'm not sure what you mean by "separate metadata for
each language". You can express whatever metadata you want at whatever
granularity you wish (since every content element can take meta children
which can contain arbitrary metadata constructs.

		 

		
________________________________


		From: Johnb@screen.subtitling.com
[mailto:Johnb@screen.subtitling.com] 
		Sent: Tuesday, March 29, 2005 10:05 AM
		To: shayes@microsoft.com; Glenn A. Adams
		Cc: public-tt@w3.org
		Subject: RE: Timed Text Authoring Format - Distribution
Format Exchange Pr ofile (DFXP) Streaming

		 

		Sean,

		 

		Hi...

		 

		This approach is one I am considering for conditional
content .... 'watershed words'

		 

		I don't favour it for language selection because it
doesn't address the issue of having separate metadata for each language.
I view that as important since there may be rights issues (e.g.
copyright and distribution) that are on a **per language basis**.

		 

		For conditional content this works quite well, as it is
trivial (in concept) to modify the style definitions.... so taking your
example and twisting slightly gives.... (note: it's set for after 8:00pm
:-)

		 

		<styling>

		    <style id="before8pm" tts:display="none" />

		    <style id="after8pm" tts:display="auto" />

		</styling>

		...

		<div>

		    <p>So I told him to <span style="before8pm">"Go
away!"</span><span style="after8pm">"Piss Off!"</span></p>

		</div>

		 

		Note - I anticipate in **most** cases conditional
content will be ... inline...

		 

		Of course - this solution works if we anticipate an
interpretation of the DFXP pre-delivery. It does not work for DFXP as a
delivery format UNLESS it is assumed that a UA can apply a user defined
stylesheet to a DFXP document or otherwise modify the DFXP document
prior to display (Comments Glenn?)

		 

		regards 

		John Birch

		 

		 -----Original Message-----
		From: Sean Hayes [mailto:shayes@microsoft.com]
		Sent: 29 March 2005 15:25
		To: Johnb@screen.subtitling.com; gadams@xfsi.com
		Cc: public-tt@w3.org
		Subject: RE: Timed Text Authoring Format - Distribution
Format Exchange Pr ofile (DFXP) Streaming

			How about an approach like the following:

			 

			<styling>

			    <style id="lang1" tts:display="none" />

			    <style id="lang2" tts:display="auto" />

			    <style id="lang3" tts:display="none" />

			</styling>

			...

			<div>

			    <p style="lang1">Bonjour</p>

			    <p style="lang2">Ola</p>

			    <p style="lang3">Hello</p>

			</div>

			 

			Here, you only have to change the display
property in the selected language and you get the bits you need. The
same approach should work for watershed words, etc.

			 

			Sean.

			 

			
________________________________


			From: public-tt-request@w3.org
[mailto:public-tt-request@w3.org] On Behalf Of
Johnb@screen.subtitling.com
			Sent: 29 March 2005 06:32
			To: gadams@xfsi.com; Johnb@screen.subtitling.com
			Cc: public-tt@w3.org
			Subject: RE: Timed Text Authoring Format -
Distribution Format Exchange Pr ofile (DFXP) Streaming

			 

			Glenn,

			 

			The 'problem' with your suggested approach **for
me** is that all of the parallel languages would share a common head
section (which contains the layout and styling elements). This would
make combining languages into a composite multi-language document
diificult - imagine if the language to be appended contains style
references that match existing ones. Further - extraction also becomes
more complex, as it would be necessary/desirable to reduce the head
element to only those element instances that are referenced by a
specific language. Secondly - since the div element cannot be nested,
use of the div element to separate parallel languages, as would be
logical for my anticipated use, would effectively remove the ability to
use div for any other structural purpose (such as separating program
segments).

			 

			Using annotations for filtering content strikes
me as a rather 'weak' approach to solving my requirement.... it also
conflicts with other potential uses for the role element - e.g.
identification of the 'type' of text it annotates (dialogue, lyrics,
description).... and any profile using a 'ttm:role' based styling
mechanism.

			 

			I think it more likely that it will be necessary
to generate a profile for DFXP, and probably a DFXP wrapper format to
handle the multi-language issue to satisfy my (and others) requirements.

			 

			regards

			John Birch

			 

			 

			-----Original Message-----
			From: Glenn A. Adams [mailto:gadams@xfsi.com]
			Sent: 29 March 2005 14:30
			To: Johnb@screen.subtitling.com
			Cc: public-tt@w3.org
			Subject: RE: Timed Text Authoring Format -
Distribution Format Exchange Pr ofile (DFXP) Streaming

				In point 1, I mean DFXP. You could,
e.g., place parallel languages in separate div, p, span elts, etc.,
although this is not a recommended usage for interchange. Then you could
use XSLT/XQuery, etc., between your archive and over-the-air inserter
(where presumably it would be transformed into some final distribution
format, e.g., DVB Subtitles). Also, you could do something similar for
annotating content to be filtered in the transform step, e.g.,
ttm:role="x-adult".

				 

				
________________________________


				From: Johnb@screen.subtitling.com
[mailto:Johnb@screen.subtitling.com] 
				Sent: Tuesday, March 29, 2005 8:41 AM
				To: Glenn A. Adams
				Cc: public-tt@w3.org
				Subject: RE: Timed Text Authoring Format
- Distribution Format Exchange Pr ofile (DFXP) Streaming

				 

				Hi Glenn,

				 

				I'm not sure I understand your response?

				 

				In point 1, do you mean AFXP? cf DFXP.
Alternatively, how would you suggest structuring a multi-language DFXP
document?

				 

				w.r.t. point 2, I have perhaps created
confusion by referring to a timedtext stream. I did not intend to imply
that the content of that element was intended for streaming in the
internet sense of the word..... rather I used the term stream as
analogous to 'thread'.

				 

				regards 

				John Birch. 

				-----Original Message-----
				From: Glenn A. Adams
[mailto:gadams@xfsi.com]
				Sent: 29 March 2005 14:13
				To: Johnb@screen.subtitling.com
				Cc: public-tt@w3.org
				Subject: RE: Timed Text Authoring Format
- Distribution Format Exchange Pr ofile (DFXP) Streaming

				1.	In the main archive, you could
have a single DFXP document that combines languages and usages
(adult/child), and then use an XSLT transform (or XQuery) to select the
portions required for a "send to air" document. 
				2.	While the TTWG does consider
streamability to be a necessary property of DFXP, it drew the line at
actually defining a streaming form, which was considered out of scope;
however, there is nothing to prevent a future specification (either in
or out of W3C) from defining such a form. 

				 

				G.

				 

				
________________________________


				From: Johnb@screen.subtitling.com
[mailto:Johnb@screen.subtitling.com] 
				Sent: Tuesday, March 29, 2005 7:22 AM
				To: Glenn A. Adams
				Cc: public-tt@w3.org
				Subject: RE: Timed Text Authoring Format
- Distribution Format Exchange Pr ofile (DFXP) Streaming

				 

				Glenn,

				 

				Current practice for subtitling in
broadcast TV is to hold an archive of all subtitle files for all
material that has been, will be, or may be broadcast.

				This can amount to many tens of
thousands of files. (David can probably give you a number for the BBCs
archive!)

				 

				Current practice (at least for us) is to
combine all individual language files into a single multi-language
package for a given program.

				 

				So, subtitle files are originated by
subtitlers in a single language - and transferred, QA'd and then
typically combined into a multi-language 'air' file.

				These 'air' files are then held in a
'subtitle archive' that can be accessed by the insertion systems when
station automation requests the playout of a particular piece of
material. Typically for a European operation there may be on average 4-6
languages present in each multi-language file (although we have systems
with many more langauges per channel than this).

				 

				There are many models being discussed
within the ad-hoc committee, doubtless there will be a transition
interval where DFXP content is held externally to the media content.
Indeed it may be (for operational reasons) that the combined MXF/AAF
with subtitles incorporated internally is only used as a 'between
broadcaster' format - not as a near to air format.

				 

				So, a nominally single language DFXP
could result in a proliferation of files (probably by a factor of 4 - 8)
for broadcasters. Note - we are assuming that insertion equipment will
move across to using DFXP **directly** here.

				 

				By onerous, there are implementation
issues to consider. The increase in the number of files creates a subtle
problem. The files have to be referred to by the automation equipment,
changing from a multi-lingual system to a single language per file
concept means that either the automation system has to send multiple
demands to the insertion equipment (for each language) - changing the
whole concept of the automation interface, or the insertion equipment
has to determine which individual DFXP files constitute the fileset for
a given material reference. It is unlikely that many broadcasters will
wish to make changes to station automation... this is VERY much an area
of "If It Aint Broke Don't Fix It" - by which I mean there is a strong
resistance to messing with such a critical aspect of a broadcasters
operation.

				 

				So we can fairly safely assume that the
insertion system will need to expand a single material reference into a
fileset. This in itself doesn't sound to difficult until you consider
that the system will need to be created and maintained by human
operators!. At present there is one point of potential failure - the
appending of a new subtitle language 'stream' into the archive. With the
multiple files approach dictated by DFXP's limitation to single language
-  more opportunies can arise for problems.

				 

				So - single language DFXP increases the
number of files to handle (by perhaps a factor of 4 - 8), and the
omission of a conditional content mechanism may multiply that again....

				 

				BTW 

				Is there any practical reason why DFXP
couldn't be multi-stream, or is it simply a philosophical issue? What
(apart from the schema) prevents a DFXP document having effectively more
than one instance of the tt element structure?

				 

				e.g. (introduction of element tts
"timedtext stream")

				 

				<tt
xmlns="http://www.w3.org/2004/11/ttaf1">
				<tts xml:lang="fr-fr">
				  <head>
				    <meta/>
				    <styling/>
				    <layout/>
				  </head>
				  <body/>
				</tts>

				<tts xml:lang="en-uk">
				  <head>
				    <meta/>
				    <styling/>
				    <layout/>
				  </head>
				  <body/>
				</tts>

				<tts xml:lang="en-uk-caption">
				  <head>
				    <meta/>
				    <styling/>
				    <layout/>
				  </head>
				  <body/>
				</tts>

				</tt>

				 

				regards 

				John Birch.

				 

				 -----Original Message-----
				From: Glenn A. Adams
[mailto:gadams@xfsi.com]
				Sent: 29 March 2005 12:28
				To: Johnb@screen.subtitling.com
				Cc: public-tt@w3.org
				Subject: RE: Timed Text Authoring Format
- Distribution Format Exchange Pr ofile (DFXP) Streaming

				Could you describe what you mean by
"subtitle archive" and "onerous to require ..."?

				 

				
________________________________


				From: Johnb@screen.subtitling.com
[mailto:Johnb@screen.subtitling.com] 
				Sent: Tuesday, March 29, 2005 3:47 AM
				To: Glenn A. Adams;
russ.wood@softel.co.uk; public-tt@w3.org
				Subject: RE: Timed Text Authoring Format
- Distribution Format Exchange Pr ofile (DFXP) Streaming

				 

				Glenn,

				 

				An issue that was discussed recently at
the AAF/MXF EBU ad-hoc subtitle commitee....

				 

				While the generation of multiple DFXP
'files' for individual languages is an acceptable solution, I feel there
may yet be a requirement for a 'lightweight' conditional content
mechanism. The specific example I have in mind is to support the concept
of viewing 'watersheds' - i.e. content unsuitable for minors.

				In this case the majority of a subtitle
file would be suitable for all viewers - but the odd word or phrase may
be 'sanitised' for pre watershed (e.g. 8.00pm) airings of the programme.
It would be onerous to require a subtitle archive to retain multiple
copies of content to cater for just the alteration of one of two words
in a 1300 line subtitle file. Is there any possibility of introducing a
conditional content facuility to DFXP that would support this kind of
minor use?

				 

				A second use of this mechanism, which
might be a stretch too far, is to support subtitle files that can be
used as captions (i.e. near verbatim + sound cues) or as subtitles. In
this case the conditional content may be the 'sound cues' and possibly
the replacement of some of the subtitle lines with less accurate (but
more concise!) translations.

				 

				best regards 

				John B.

				-----Original Message-----
				From: Glenn A. Adams
[mailto:gadams@xfsi.com]
				Sent: 26 March 2005 05:47
				To: Russ Wood; public-tt@w3.org
				Subject: RE: Timed Text Authoring Format
- Distribution Format Exchange Profile (DFXP) Streaming

				DFXP supports general use of xml:lang
attribute in order to (1) specify a default language for document
instance and (2) to annotation language of nested content. It is up to
the author to decide how to use this mechanism. For example, an author
could potentially specify different <div/> elements using different
languages, or different <p/> elements, etc. Nonetheless, the intention
is not to explicitly support in DFXP conditional content selection based
on preferred language. In contrast, conditional content selection will
be supported in AFXP. The intent with DFXP is to have already made all
conditional selections prior to transmitting/exchanging in DFXP format.
This means that if an AFXP document supports course granular conditional
selection between parallel language representations, then one may
produce multiple DFXP document instances from a single AFXP document
instance, by enumerating over the condional parameter space (of which
each permutation may produce a distinct DFXP document instance).

				 

				Regards,

				Glenn

				 

				
________________________________


				From: Russ Wood
[mailto:russ.wood@softel.co.uk] 
				Sent: Monday, March 21, 2005 5:36 AM
				To: public-tt@w3.org
				Subject: RE: Timed Text Authoring Format
- Distribution Format Exchange Pr ofile (DFXP) Streaming

				 

				3) I don't see a problem with allowing
different languages in the same document but amalgamating different
language files at run time is not difficult.
Received on Tuesday, 29 March 2005 17:15:30 UTC