RE: First draft of a brief explainer from John Birch on 2018-08-01 (public-audio-description@w3.org from August 2018)

From: John Birch <John.Birch@screensystems.tv>
Date: Wed, 1 Aug 2018 11:53:57 +0000
To: Nigel Megitt <nigel.megitt@bbc.co.uk>, "public-audio-description@w3.org" <public-audio-description@w3.org>
Message-ID: <0981DC6F684DE44FBE8E2602C456E8AB0240295433@SS-IP-EXMB-01.screensystems.tv>
Hi Nigel,

Yep, I'm getting that this is possible with limitations.... It's more that the obvious use cases seem to be the more naïvely obscure syntactically...

Say we have three spans, and you want to dip the audio more for the middle one... how do you do that?

Best,
John


John Birch | Strategy and Business Development Manager | Screen
Main Line : +44 1473 831700 | Ext: 2208 | Direct Dial: +44 1473 834532
Mobile: +44 7919 558380 | Fax : +44 1473 830078
John.Birch@screensystems.tv<mailto:John.Birch@screensystems.tv>

Visit us at
IBC, RAI Amsterdam | Stand 1.C49 | 14-18 September

[http://www.subtitling.com]<http://www.subtitling.com/>  [https://www.linkedin.com/company/screen-subtitling-systems-ltd] <https://www.linkedin.com/company/screen-subtitling-systems-ltd>   [https://www.youtube.com/channel/Screen Subtitling Systems] <https://www.youtube.com/channel/UCBp2nyUeNbIFz9cD66Ym3AQ>   [https://twitter.com/ScreenSystems] <https://twitter.com/ScreenSystems>

[cid:EmailSigBanner-NewPrices010218_450x128v2_3a01add5-194d-4b36-846e-152293eb2cd1.jpg]<https://subtitling.com/products/subtitle-create/create/wincaps-q4-subtitling-software/>

P Before printing, think about the environment
From: Nigel Megitt [mailto:nigel.megitt@bbc.co.uk]
Sent: 01 August 2018 12:21
To: John Birch <John.Birch@screensystems.tv>; public-audio-description@w3.org
Subject: Re: First draft of a brief explainer

Hi John,

If you want to dip the main audio, for example to mute it completely, and then add in some sound from somewhere else, then you need a way to say "(1) set the audio I receive from my parent to gain = 0 and then (2) mix in this new audio" assuming that the main programme audio is provided from above.

We can do the first part by creating a timed element that sets the gain to zero. The second part is achieved by creating a child of that element that has a child audio element. You could do it with nested spans, or we could say that the parent div provides the main programme audio and do something like:

<div>
  <audio src="main_programme_audio">

  <p begin="10s" end="15s" tta:gain="0"> <!-- (1) -->

    <span> <!-- (2) -->
      <audio src="other_audio">
    </span>
  </p>
</div>

Now the span receives muted audio from the p, and adds its own audio in. The span is the leaf element whose audio output is mixed into the final output.

Kind regards,

Nigel



From: John Birch <John.Birch@screensystems.tv<mailto:John.Birch@screensystems.tv>>
Date: Wednesday, 1 August 2018 at 12:11
To: Nigel Megitt <nigel.megitt@bbc.co.uk<mailto:nigel.megitt@bbc.co.uk>>, "public-audio-description@w3.org<mailto:public-audio-description@w3.org>" <public-audio-description@w3.org<mailto:public-audio-description@w3.org>>
Subject: RE: First draft of a brief explainer

Hi Nigel,

It's an explanation... but I'm not sure it helps.

In general, for audio description we need to affect the gain of the main audio directly from within the span I think.

This
(p_audio * 0.8 + span_audio * 1) * 0.5
requires some maths on the numbers if you want to dip the main audio more for a specific span, since the span gain term applies to both the main audio and the span audio?
For example, if you wanted to completely mute the main audio for a span, how would you do that?

P.S. I may be being slow today!

Best,
John

John Birch | Strategy and Business Development Manager | Screen
Main Line : +44 1473 831700 | Ext: 2208 | Direct Dial: +44 1473 834532
Mobile: +44 7919 558380 | Fax : +44 1473 830078
John.Birch@screensystems.tv<mailto:John.Birch@screensystems.tv>

Visit us at
IBC, RAI Amsterdam | Stand 1.C49 | 14-18 September

[http://www.subtitling.com]<http://www.subtitling.com/>  [https://www.linkedin.com/company/screen-subtitling-systems-ltd] <https://www.linkedin.com/company/screen-subtitling-systems-ltd>   [https://www.youtube.com/channel/Screen Subtitling Systems] <https://www.youtube.com/channel/UCBp2nyUeNbIFz9cD66Ym3AQ>   [https://twitter.com/ScreenSystems] <https://twitter.com/ScreenSystems>

[cid:image005.jpg@01D42996.B50C9180]<https://subtitling.com/products/subtitle-create/create/wincaps-q4-subtitling-software/>

PBefore printing, think about the environment
From: Nigel Megitt [mailto:nigel.megitt@bbc.co.uk]
Sent: 01 August 2018 10:00
To: John Birch <John.Birch@screensystems.tv<mailto:John.Birch@screensystems.tv>>; public-audio-description@w3.org<mailto:public-audio-description@w3.org>
Subject: Re: First draft of a brief explainer

Hi John,

Thanks for that. Actually your interpretation is not what is intended. Rather the pan and gain properties always act on the post-mix audio generated by the element to which they apply.

Taking an example:

<p>
   <audio src="p_audio" tta:gain="0.8"> <!-- this audio is an input to the p -->

   <span tta:gain="0.5"> <!-- receives p's audio output plus its own audio children as input -->
      <audio src="span_audio" tta:gain="1">
      Some optional span text
   </span>
</p>

The <p> includes an <audio> element that plays whatever "p_audio" refers to, setting p_audio's gain to 0.8.
The <span> therefore receives that <p>'s audio, and also its own "span_audio" whose gain is 1"
Let's use a shortcut of * to mean "with a gain of" and + to mean "mixed with" and look at the result:
After mixing p_audio * 0.8 + span_audio * 1 the span then applies a further gain of 0.5, so the end result is:

(p_audio * 0.8 + span_audio * 1) * 0.5

So the action is always on the same element, but we have to bring in the fact that the audio output of a parent element is routed to its children that can apply further processing, in the body -> div -> p -> span hierarchy.  Those elements can have additional audio inputs specified by their own child <audio> elements.

In the end, the audio output of all the temporally active audio-generating span leaf nodes is mixed together (in Web Audio terms, is sent to the same "bus") to generate the final output, subject to any master volume control.

Does that help?

Kind regards,

Nigel


From: John Birch <John.Birch@screensystems.tv<mailto:John.Birch@screensystems.tv>>
Date: Wednesday, 1 August 2018 at 09:46
To: Nigel Megitt <nigel.megitt@bbc.co.uk<mailto:nigel.megitt@bbc.co.uk>>, "public-audio-description@w3.org<mailto:public-audio-description@w3.org>" <public-audio-description@w3.org<mailto:public-audio-description@w3.org>>
Subject: RE: First draft of a brief explainer

Hi Nigel,

Interesting proposal.

I think there is a potential source of confusion regarding the target of the tta:gain and tta:pan properties.
For some of those property instances in the example, the action is upon the parent audio (the main tracks), but for other instances, the property acts upon the child.

I think this may be confusing...although I understand how this works within an inheritance schema, experience suggests that those with less familiarity may not understand which property affects which 'track'.

I'm not sure this is easy to resolve...

Best regards,
John


John Birch | Strategy and Business Development Manager | Screen
Main Line : +44 1473 831700 | Ext: 2208 | Direct Dial: +44 1473 834532
Mobile: +44 7919 558380 | Fax : +44 1473 830078
John.Birch@screensystems.tv<mailto:John.Birch@screensystems.tv>

Visit us at
IBC, RAI Amsterdam | Stand 1.C49 | 14-18 September

[http://www.subtitling.com]<http://www.subtitling.com/>  [https://www.linkedin.com/company/screen-subtitling-systems-ltd] <https://www.linkedin.com/company/screen-subtitling-systems-ltd>   [https://www.youtube.com/channel/Screen Subtitling Systems] <https://www.youtube.com/channel/UCBp2nyUeNbIFz9cD66Ym3AQ>   [https://twitter.com/ScreenSystems] <https://twitter.com/ScreenSystems>

[cid:image005.jpg@01D42996.B50C9180]<https://subtitling.com/products/subtitle-create/create/wincaps-q4-subtitling-software/>

PBefore printing, think about the environment
From: Nigel Megitt [mailto:nigel.megitt@bbc.co.uk]
Sent: 27 July 2018 12:21
To: public-audio-description@w3.org<mailto:public-audio-description@w3.org>
Subject: First draft of a brief explainer

Hi all,

I thought it would be worth writing a quick explainer of what I have in mind, that I'm working towards with AD in TTML2. Thoughts very welcome! If this is worthwhile, we should create a GitHub repository for the group and pop it in there so we can edit it and track those edits. Let me know what you think!

Kind regards,

Nigel


Audio Description Explainer
Introduction

The goal is to be able to deliver audio description script, pre-recorded audio and mixing data in a single file with an open standard format, that can also use text to speech for potential client-side audio rendering later.

There is work in TTML2<https://w3c.github.io/ttml2/index.html> to define better constructs for representing the main requirements - continuous animation, pan and gain for mixing, speech rate and pitch for text to speech. This is implementable in browsers using Web Audio<https://www.w3.org/TR/webaudio/> and Web Speech<https://w3c.github.io/speech-api/speechapi.html> respectively, though the latter needs some work. There is also an interested bunch of people in the W3C Audio Description Community Group<https://www.w3.org/community/audio-description/> who would support creation of an open standard format meeting the requirements<https://github.com/w3c/ttml2/wiki/Audio-Description-Requirements> agreed for TTML2..

In the end, we should be able to deliver an AD profile of a TTML2 file to clients which provide real time mixing of the AD, perhaps with some user customisation (like changing the relative volumes of the main programme audio and the AD audio), or even presentation of the AD script text on a completely different device, like a braille display. Those clients might be hosted server side to create "broadcaster mix" or genuinely on the client to create "receiver mix". Obviously presentation on a braille display and customisation do require the client side player to be used.

File Format

The file format should look something like:

<?xml version="1.0" encoding="UTF-8"?>




<tt xmlns="http://www.w3.org/ns/ttml" xmlns:ttd="http://www.w3.org/ns/ttml#datatype" xmlns:ttm="http://www.w3.org/ns/ttml#metadata" xmlns:ttp="http://www.w3.org/ns/ttml#parameter" xmlns:tts="http://www.w3.org/ns/ttml#styling" xmlns:tta="http://www.w3.org/ns/ttml#audio" xmlns:xlink="http://www.w3.org/1999/xlink" xml:lang="en">




  <body>

    <div>

      <audio src=";track=1" tta:pan="-1"/>

      <audio src=";track=2" tta:pan="1"/>



      <p xml:id="ad21b" begin="5.48s" end="19.44s" tta:gain="0.25">

        <animate begin="0.0s" end="0.12s" tta:gain="1;0.39"/>

        <animate begin="13.84s" end="13.96s" tta:gain="0.39;1"/>

        <span begin="0.12s" end="13.84s" tta:gain="0.25">

          <audio src="DRAD182Y01.wav" clipBegin="11.6s" clipEnd="24.32s"/>

          BBC Eastenders written by Colin Wyatt starring June Brown as Dot, John Altman as Nick, Declan Bennett as Charlie and Samantha Womack as Ronnie.</span>

      </p>




      <p xml:id="ad31b" begin="30.56s" end="32.84s" tta:gain="0.25">

        <animate begin="0.0s" end="0.12s" tta:gain="1;0.39"/>

        <animate begin="2.16s" end="2.28s" tta:gain="0.39;1"/>

        <span begin="0.12s" end="2.16s" tta:gain="0.25">

          <audio src="DRAD182Y01.wav" clipBegin="35.68s" clipEnd="37.72s"/>

          Nick takes a drag of his cigarette.</span>

      </p>




      <p xml:id="ad41b" begin="49.32s" end="51.16s">

        <animate begin="0.0s" end="0.12s" tta:gain="1;0.39"/>

        <animate begin="1.72s" end="1.84s" tta:gain="0.39;1"/>

        <span begin="0.12s" end="1.72s">

          <audio src="DRAD182Y01.wav" clipBegin="54.44s" clipEnd="56.04s"/>

          Nick gets up.</span>

      </p>



      <p xml:id="ad51b" begin="54.92s" end="57.08s">

        <animate begin="0.0s" end="0.12s" tta:gain="1;0.39"/>

        <animate begin="2.04s" end="2.16s" tta:gain="0.39;1"/>

        <span begin="0.12s" end="2.04s">

          <audio src="DRAD182Y01.wav" clipBegin="60.04s" clipEnd="61.96s"/>

          He grabs a knife.</span>

      </p>



      <p xml:id="ad61b" begin="62.24s" end="71.52s">

        <ttm:desc ttm:role="x-shotDescription">SHOT CHANGE: 62.36s Nick Cotton centre screen facing right.</ttm:desc>

        <animate begin="0.0s" end="0.12s" tta:gain="1;0.39"/>

        <animate begin="9.16s" end="9.28s" tta:gain="0.39;1"/>

        <span begin="0.12s" end="9.16s">

          <audio src=";track=2" tta:gain="0.25"/>

          Ronnie looks worried but he grabs a swiss roll from a carrier bag and roughly cuts off two slices offering her one on the end of a knife.</span

      </p>



      <p xml:id="ad71b" begin="79.2s" end="82.12s">

        <ttm:desc ttm:role="x-pronunciationNote">PRON: Koosh.</ttm:desc>

        <animate begin="0.0s" end="0.12s" tta:gain="1;0.39"/>

        <animate begin="2.8s" end="2.92s" tta:gain="0.39;1"/>

        <span begin="0.12s" end="2.8s">

          <audio src="DRAD182Y01.wav" clipBegin="84.32s" clipEnd="87.0s"/>

          Sonia leaves the Vic followed by Kush </span>

      </p>



      <p xml:id="ad91b" begin="115.16s" end="117.12s">

        <animate begin="0.0s" end="0.12s" tta:gain="1;0.39"/>

        <animate begin="1.84s" end="1.96s" tta:gain="0.39;1"/>

        <span begin="0.12s" end="1.84s">

          <audio src="DRAD182Y01.wav" clipBegin="120.28s" clipEnd="122.0s"/>

          At Dot's...</span>

      </p>



    </div>

  </body>

</tt>



Let's look at this in a bit more detail:

We have a div element that wraps all the other content. Crucially it includes two audio element children, which do a few things:
?        They make that parent element an Audio generating element<https://w3c.github.io/ttml2/index.html#terms-audio-generating-element>, which means that the player code needs to create a Web Audio graph node for it.
?        They tell the player to add ;track=1 (whatever that means) in and pan it all the way to the left, i.e. tts:pan="-1".
?        They tell the player to add ;track=2 (whatever that means) in and pan it all the way to the right, i.e. tts:pan="1".

We need a convention to identify "tracks that are provided to us from somewhere else", and in this case we've defined ;track=n to do that.

Then there are bunch of child p elements that each have a begin and end time. They each represent a snippet of audio description and the time during which some stuff happens. The text of the audio description is contained in a child span element, which itself has begin and end times. The span's begin and end times are relative to the parent p element's begin time.

You might see that there's some metadata there too, which might be helpful during the authoring process, for example.

We need a few things to happen for each snippet of audio description:

  1.  Fade down the programme audio level.
  2.  Play the audio description audio chunk, stereo panned to the right place.
  3.  Fade the programme audio back up

The fade up and down are both achieved by placing animate elements as children of the p element. They smoothly change ("continuously animate") the tta:gain value between values, in a semi-colon separated list, where the begin and end times of the animation are specified on the element and are relative to the parent p element's begin time. The audio that they modify is the audio that is available to that element, i.e. the programme audio that comes down to the p from the parent div (remember that specified some audio? This is where it goes).

Playing the audio description is done by adding a new audio child to the span. The playback begins in the presentation at the span's begin time, and the clipBegin and clipEnd mark the in and out points of the referenced audio resource to play, which is specified by the src attribute. If we wanted to specify a left/right pan value, we could do that by setting a tta:pan attribute on the audio element itself. Similarly we could vary the level of the audio by setting a tta:gain value.

This structure is implemented in the mixing code by constructing a Web Audio Graph, where the outputs of all the spans are, in the end, mixed together.



This message may contain confidential and/or privileged information. If you are not the intended recipient you must not use, copy, disclose or take any action based on this message or any information herein. If you have received this message in error, please advise the sender immediately by reply e-mail and delete this message. Thank you for your cooperation. Screen Subtitling Systems Ltd. Registered in England No. 2596832. Registered Office: The Old Rectory, Claydon Church Lane, Claydon, Ipswich, Suffolk, IP6 0EQ
  



----------------------------

http://www.bbc.co.uk
This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.

---------------------

This message may contain confidential and/or privileged information. If you are not the intended recipient you must not use, copy, disclose or take any action based on this message or any information herein. If you have received this message in error, please advise the sender immediately by reply e-mail and delete this message. Thank you for your cooperation. Screen Subtitling Systems Ltd. Registered in England No. 2596832. Registered Office: The Old Rectory, Claydon Church Lane, Claydon, Ipswich, Suffolk, IP6 0EQ
Attachments

image/png attachment: image001.png
image/png attachment: image002.png
image/png attachment: image003.png
image/png attachment: image004.png
image/jpeg attachment: image005.jpg
image/png attachment: screenlogo_1cd3a137-5fcc-4d5e-9d6c-69fea2b1c03b.png
image/png attachment: linkedin_ce65fa59-68cb-4503-8e59-6e24e4aadbbf.png
image/png attachment: youtube_706c5e7d-cbfa-4d3e-9eab-57ded0df1770.png
image/png attachment: twitter_89da90c4-4419-4a20-bfe3-97f3a96135ec.png
image/jpeg attachment:
Received on Wednesday, 1 August 2018 12:01:55 UTC