RE: [ttml2] embedded content, background image styling, other edits

Hmm…

Re: I don't wish to break the very common understanding that resources and files are 1:1. If you want to divide a file into multiple files that are related to one another, then you need to use something else to link them together.

I’m not seeking to *split* a file, which is what chunking is intended for (e.g. to facilitate streaming), instead I am seeking to accurately ‘model’ an existing real world multi-file resource… i.e. take multiple files and represent them as a single resource.
Nigel’s example below seems legal, and seems to model the requirement without contravening proposed normative text…

Best regards,
John


John Birch | Strategic Partnerships Manager | Screen
Main Line : +44 1473 831700 | Ext : 2208 | Direct Dial : +44 1473 834532
Mobile : +44 7919 558380 | Fax : +44 1473 830078
John.Birch@screensystems.tv<mailto:John.Birch@screensystems.tv> | www.screensystems.tv<http://www.screensystems.tv> | https://twitter.com/screensystems

Visit us at
BVE, Excel London 24-26 February 2015 Stand No. N19

P Before printing, think about the environment

From: Nigel Megitt [mailto:nigel.megitt@bbc.co.uk]
Sent: 26 November 2014 14:42
To: Glenn Adams; John Birch
Cc: TTWG; Andreas Tai
Subject: Re: [ttml2] embedded content, background image styling, other edits

From: Glenn Adams <glenn@skynav.com<mailto:glenn@skynav.com>> Date: Wednesday, 26 November 2014 14:23


On Wed, Nov 26, 2014 at 4:20 AM, John Birch <John.Birch@screensystems.tv<mailto:John.Birch@screensystems.tv>> wrote:
Hi Glenn,

Apologies for ‘splitting the thread’ but I wanted to comment on two specifics in your earlier email…

Re: In the above example, charset is a parameter of the MIME type expression of the type attribute. No relationship with the 'encoding' attribute.

How easy it is to misread a ; as a “ !  ;-) My subsequent comments were based on that misinterpretation, but thanks for the confirmation that I perhaps understand the use of encoding correctly ☺.

Re: I guess you are thinking chunk might be used to contain multiple original files (separately named) that are concatenated to form a single data resource. This is counter to my intention that a chunk contain a fragment of a data resource, or, if one wished, an entire data resource (if a data element has a single chunk child like in the above example).

Yes, that is exactly what I was thinking… I’m assuming that you would suggest that my multiple files would each be represented by separate data elements?

Exactly.

The problem I have with that is that the separate ‘files’ in my example actually form a single resource from the perspective of EBU-TT…

I don't wish to break the very common understanding that resources and files are 1:1. If you want to divide a file into multiple files that are related to one another, then you need to use something else to link them together. Nigel has proposed such a mechanism in CP25<https://www.w3.org/wiki/TTML/changeProposal025>. See also Issue 288<http://www.w3.org/AudioVideo/TT/tracker/issues/288>.

I don't see how my proposal in CP25 applies here. That's about combining TTML documents together, whereas this use case is about referenced or embedded resources themselves being composed of identified and ordered sub-parts.

In the case of a multi-part STL file I think the currently proposed solution permits it without modification – for example:

<metadata>
<data format="EBU Tech 3264" length="12092">
<ttm:item name="fileName">disk1.stl</ttm:item>
<ttm:item name="diskNumber">1</ttm:item>
<chunk length="6020">[some chunk data]</chunk>
<chunk length="6032">[some more chunk data]</chunk>
</data>
<data format="EBU Tech 3264" length="5620">
<ttm:item name="fileName">disk2.stl</ttm:item>
<ttm:item name="diskNumber">2</ttm:item>
<chunk length="5620">[Only one chunk in this one]</chunk>
</data>
</metadata>



The chunk child of data is not intended to address this use case.

i.e. the STL standard defines the concept of a primary file and optional extension files that are conceptually part of the whole. I have attached the relevant specification FYI. This is admittedly a marginal use case, as I  have never seen a ‘subtitle list’ that spans multiple ‘disks’ !

However, I didn’t see anything in the definition of chunks that precludes it being used in the way I proposed? Did I miss something?

It is conceptually not intended to handle this case. Further, chunk allows only #PCDATA content, so no metadata children where you might use the new ttm:item element to link to some external identifier.

My conceptual model for chunks was the HTTP 1.1 chunked transfer encoding [1].

[1] http://tools.ietf.org/html/rfc2616#section-3.6.1



Best regards,
John


John Birch | Strategic Partnerships Manager | Screen
Main Line : +44 1473 831700<tel:%2B44%201473%20831700> | Ext : 2208 | Direct Dial : +44 1473 834532<tel:%2B44%201473%20834532>
Mobile : +44 7919 558380<tel:%2B44%207919%20558380> | Fax : +44 1473 830078<tel:%2B44%201473%20830078>
John.Birch@screensystems.tv<mailto:John.Birch@screensystems.tv> | www.screensystems.tv<http://www.screensystems.tv> | https://twitter.com/screensystems


Visit us at
BVE, Excel London 24-26 February 2015 Stand No. N19

P Before printing, think about the environment

From: Glenn Adams [mailto:glenn@skynav.com<mailto:glenn@skynav.com>]
Sent: 25 November 2014 17:12
To: John Birch
Cc: TTWG; Andreas Tai; Nigel Megitt
Subject: Re: [ttml2] embedded content, background image styling, other edits

Thanks for your review and input.

On Tue, Nov 25, 2014 at 2:43 AM, John Birch <John.Birch@screensystems.tv<mailto:John.Birch@screensystems.tv>> wrote:
Hi Glenn,

I’ve been looking at your proposal for embedded content (and I like it ☺ but I noticed the following:

...
<data type="text/plain; charset=us-ascii" length="44">
  <chunk length="19">
    VGhlIHF1aWNrIGJyb3duIGZveA==
  </chunk>
  <chunk length="25">
    IGp1bXBzIG92ZXIgdGhlIGxhenkgZG9nLg==
  </chunk>
</data>
...

I don’t see ‘charset’ as an attribute of data in the later definition…

In the above example, charset is a parameter of the MIME type expression of the type attribute. No relationship with the 'encoding' attribute.

I thought initially this was a typo and that it should be encoding, but then realised that ‘encoding’ is not permitted on the parent data element when using chunks?

Correct, 'encoding' is permitted only on the data element when the content of the data element uses simple data embedding, i.e., consists only of #PCDATA that encodes the data bytes. If a data element uses chunked data embedding, i.e., uses chunk element children, then all of the encoded data bytes are in child chunk elements, so the encoding attribute applies only to the chunk children and not the data parent.

In the above example, the encoding attribute's default value 'base64' applies on the chunk children, and is not relevant on the data parent.


However, the binaryData element in EBU-TT includes an attribute that does not appear within your proposed scheme, which is the fileName attribute. This optional attribute may contain “A filename that may be used to identify the original filename of the tunnelled binary data”.

I'm not sure there is a useful use case for formalizing such a name/identifier. It wouldn't have any relevant processing semantics in TTML2, so it is definitely in the metadata category, which means using a ttm:* attribute or a metadata element, e.g., one could use the following convention with ttm:desc.

<data type='application/octet-stream'>
  <metadata>
    <ttm:desc>original_file_name=foo.bar</ttm:desc>
  </metadata>
  <chunk>
  ... encoded base64 data ...
  </chunk>
</data>

I wouldn't mind adding a new ttm:* element that generalizes the task of defining metadata parameters, e.g.,

<ttm:parameter name='fileName'>foo.bar</ttm:parameter>

which would allow rewriting the above example to

<data type='application/octet-stream'>
  <metadata>
    <ttm:parameter name='original_file_name'>foo.bar</ttm:parameter>
  </metadata>
  <chunk>
  ... encoded base64 data ...
  </chunk>
</data>

There is some discussion as to the usage of this element, it could be viewed as metadata of ‘historic interest’ (e.g. the name of the DOS file that is encapsulated by the binary data) or it could be seen as a directive (i.e. what to name a file should a processor re-create a file from the binary data). My personal leanings are towards the former use…

Regardless of the use of this attribute, I would like to propose that the TTWG consider adding a similar attribute to the data and chunk elements in TTWG. (As an aside the chunk element you propose would be a better resolution to one of the original EBU-TT requirements where multiple floppy disk based DOS files make up an STL document.) Clearly the name ‘fileName’ would be inappropriate in a more generic specification but I tentatively suggest the following (below).

We could consider it. But it definitely would not be appropriate on chunk, which sole purpose is to enable streaming of data fragment children of a data element. I guess you are thinking chunk might be used to contain multiple original files (separately named) that are concatenated to form a single data resource. This is counter to my intention that a chunk contain a fragment of a data resource, or, if one wished, an entire data resource (if a data element has a single chunk child like in the above example).


Best regards,
John


<chunk
  encoding<https://dvcs.w3.org/hg/ttml/raw-file/tip/ttml2/spec/ttml2.html#embedded-content-attribute-encoding> = (base16|base32|base32hex|base64|base64url) : base64
  length = xsd:nonNegativeInteger<http://www.w3.org/TR/xmlschema-2/#nonNegativeInteger>
  xml:id<https://dvcs.w3.org/hg/ttml/raw-file/tip/ttml2/spec/ttml2.html#content-attribute-id> = ID
  ident = xs:string
  {any attribute not in default or any TT namespace}>
  Content: #PCDATA
</chunk>


An ident attribute may be specified, and may contain an identifier for the contained data from an external context (e.g. the original filename).

And
<data
  encoding<https://dvcs.w3.org/hg/ttml/raw-file/tip/ttml2/spec/ttml2.html#embedded-content-attribute-encoding> = (base16|base32|base32hex|base64|base64url) : see prose below
  format<https://dvcs.w3.org/hg/ttml/raw-file/tip/ttml2/spec/ttml2.html#embedded-content-attribute-format> = <data-format><https://dvcs.w3.org/hg/ttml/raw-file/tip/ttml2/spec/ttml2.html#embedded-content-value-data-format>
  length = xsd:nonNegativeInteger<http://www.w3.org/TR/xmlschema-2/#nonNegativeInteger>
  src<https://dvcs.w3.org/hg/ttml/raw-file/tip/ttml2/spec/ttml2.html#embedded-content-attribute-src> = <data><https://dvcs.w3.org/hg/ttml/raw-file/tip/ttml2/spec/ttml2.html#embedded-content-value-data>
  type<https://dvcs.w3.org/hg/ttml/raw-file/tip/ttml2/spec/ttml2.html#embedded-content-attribute-type> = xsd:string<http://www.w3.org/TR/xmlschema-2/#string> : see prose below
  xml:id<https://dvcs.w3.org/hg/ttml/raw-file/tip/ttml2/spec/ttml2.html#content-attribute-id> = ID
  ident = xs:string
  xml:lang<https://dvcs.w3.org/hg/ttml/raw-file/tip/ttml2/spec/ttml2.html#content-attribute-lang> = xsd:string<http://www.w3.org/TR/xmlschema-2/#string>
  xml:space<https://dvcs.w3.org/hg/ttml/raw-file/tip/ttml2/spec/ttml2.html#content-attribute-space> = (default|preserve)
  {any attribute not in default or any TT namespace}>
  Content: #PCDATA | (Metadata.class<https://dvcs.w3.org/hg/ttml/raw-file/tip/ttml2/spec/ttml2.html#element-vocab-group-metadata>*, chunk<https://dvcs.w3.org/hg/ttml/raw-file/tip/ttml2/spec/ttml2.html#embedded-content-vocabulary-chunk>+) | (Metadata.class<https://dvcs.w3.org/hg/ttml/raw-file/tip/ttml2/spec/ttml2.html#element-vocab-group-metadata>*, source<https://dvcs.w3.org/hg/ttml/raw-file/tip/ttml2/spec/ttml2.html#embedded-content-vocabulary-source>+)
</data>


If simple data embedding is used, i.e., the content of the data element is one or more text nodes, then an ident attribute may be specified, and may contain an identifier for the contained data from an external context (e.g. the original filename). If chunked or sourced data embedding is used, i.e., the content of the data element contains any child chunk<https://dvcs.w3.org/hg/ttml/raw-file/tip/ttml2/spec/ttml2.html#embedded-content-vocabulary-chunk> or source<https://dvcs.w3.org/hg/ttml/raw-file/tip/ttml2/spec/ttml2.html#embedded-content-vocabulary-source> element, then an ident attribute must not be specified.


John Birch | Strategic Partnerships Manager | Screen
Main Line : +44 1473 831700<tel:%2B44%201473%20831700> | Ext : 2208 | Direct Dial : +44 1473 834532<tel:%2B44%201473%20834532>
Mobile : +44 7919 558380<tel:%2B44%207919%20558380> | Fax : +44 1473 830078<tel:%2B44%201473%20830078>
John.Birch@screensystems.tv<mailto:John.Birch@screensystems.tv> | www.screensystems.tv<http://www.screensystems.tv> | https://twitter.com/screensystems


Visit us at
BVE, Excel London 24-26 February 2015 Stand No. N19

P Before printing, think about the environment

From: Glenn Adams [mailto:glenn@skynav.com<mailto:glenn@skynav.com>]
Sent: 23 November 2014 22:50
To: TTWG
Subject: [ttml2] embedded content, background image styling, other edits

I've updated the TTML2 ED [1] a few times in the last few days to:

  *   add embedded content element types (new section 9)

     *   audio
     *   chunk
     *   data
     *   font
     *   image
     *   resources
     *   source

  *   add background image styling

     *   tts:backgroundImage
     *   tts:backgroundRepeat
     *   tts:backgroundPosition

  *   sub-divide former parameters section into profiles section and parameters section
For detailed diffs, see change sets [2][3][4][5][6].

[1] https://dvcs.w3.org/hg/ttml/raw-file/tip/ttml2/spec/ttml2.html

[2] https://dvcs.w3.org/hg/ttml/rev/d16d284100b9

[3] https://dvcs.w3.org/hg/ttml/rev/67327764d375

[4] https://dvcs.w3.org/hg/ttml/rev/25035c814da5

[5] https://dvcs.w3.org/hg/ttml/rev/cc2b7aad7e7a

[6] https://dvcs.w3.org/hg/ttml/rev/e3fdbceb09cb



This message may contain confidential and/or privileged information. If you are not the intended recipient you must not use, copy, disclose or take any action based on this message or any information herein. If you have received this message in error, please advise the sender immediately by reply e-mail and delete this message. Thank you for your cooperation. Screen Subtitling Systems Ltd. Registered in England No. 2596832. Registered Office: The Old Rectory, Claydon Church Lane, Claydon, Ipswich, Suffolk, IP6 0EQ
  ­­


This message may contain confidential and/or privileged information. If you are not the intended recipient you must not use, copy, disclose or take any action based on this message or any information herein. If you have received this message in error, please advise the sender immediately by reply e-mail and delete this message. Thank you for your cooperation. Screen Subtitling Systems Ltd. Registered in England No. 2596832. Registered Office: The Old Rectory, Claydon Church Lane, Claydon, Ipswich, Suffolk, IP6 0EQ
  ­­

Received on Wednesday, 26 November 2014 15:56:18 UTC