Re: Review of MediaStreamTrack Content Hints

*The media hints feedback was discussed in APA meeting. Below is a version
revised as per the comments. *


*TITLE *

APA comments on the MediaStreamTrack Content Hints draft specification, at
https://www.w3.org/TR/mst-content-hint/.

*OVERVIEW *

*Content hint attributes defined in this specification will benefit
consumers who rely on assistive technology (AT) and personalization. *The
specification notes its focus on end-users' experience: "Adding a media
-content hint provides a way for a web application to help track consumers
make more informed decision[s]...." Content authors can author contentHint
<https://t.sidekickopen90.com/s3t/c/5/f18dQhb0S7kF8cFFTBW4T_qld2zGCwVN8Jbw_8QsRtKVn1vXj1p1kknW16gGBN41Jd6G101?te=W3R5hFj4cm2zwW4mKLS-4mbkbhW49Ldrl308ybGW4fdgvc41YylgW4fdgXQ41YszVW3H90C_3_SMDQW3zh2Fq3K1LvHW49HR8w1Gy-qYW4fGC1K3R0JW00&si=8000000004174048&pi=58151f60-5af3-4f61-ebc9-364d322a7e5a>
 with the experience of AT users in mind, or UAs acting on behalf of
users.  This specification's introduction would be a good place to
clarify this as a further benefit of content hints. Content authors may
author content hints with AT in mind. In addition, we encourage User Agents
to make this hint available to downstream consumers via API,

*The specification make no mention of hints regarding support files *(captions,
audio descriptions) that often accompany media content, either linked to it
in HTML externally (using the <track> element) or furnished 'in-band',
e.g., contained within the .MP4 wrapper (HasCaptions: T/F,
HasAudioDescription: T/F). If either return True, THEN they need to be
exposed in the UI: essentially as 'active' buttons in the Controls. Such
support files can be critical to the accessibility of a media track, as for
example when an American Sign Language video is supplied seperately, but
linked. Did the WG consider whether hints could also usefully convey
whether the media content has such supporting files?

Regarding Section 4: *The specification's hints could address more directly
some common **audio and video formats that are often encountered with
content that has been made accessible. *For clarity, such formats could
propose hints such as these (these are examples for clarity only, we leave
you to define such hints):

*For Audio, an additional hint to indicate the presence of
audio-description *(or some similar label as you find appropriate).
Audio-description is audio that resembles speech-recognition, but does not
contain data for the purpose of speech recognition by a machine.
Audio-description is audio that resembles "speech" but it will likely not
be appropriate to apply noise suppression or boost intelligibility of the
incoming signal.

In the language of the specification (4.1) , "A track with content
hint "audio-description"
should be treated as if it contains audio data, without background noise,
describing in words the activity in the video."


*For Video, an additional hint to indicate the presence of transcription
embedded in the video*, e.g., motion-with-transcription (or some similar
label as you find appropriate). motion-with-transcription would refer to a
motion video that has, embedded, transcription data, either a
picture-in-picture showing a sign language interpreter, or text captions
embedded in the video.

In the language of the specification (4.2): A content hint of
motion-with-transcription should be treated such that one region of the
video frame has details that are extra important, and in that region that
significant sharp edges and areas of consistent color can occur frequently
(the area with sign language interpretation, or the area with onscreen
captioned text). This screen region would optimize for detail in the
resulting individual frames rather than smooth playback. Artefacts from
quantization or downscaling should be avoided.


*Regarding section 5, the degradation preference does not address regions.*
Picture regions may be very significant for accessibility. Consider a video
with sign language interpretation embedded (e.g., in the upper right
corner), or a video with captions "burned-in" or embedded (e.g., in the
bottom of the picture area). (While APA does not advocate for such embedded
captions, they are common particularly on social media where the default
user behavior is audio "off." These regions would benefit from different
encoding decisions than the rest of the frame.  Regions may be encoded and
decoded quite differently: for example in AVC, "it is also possible to
create truly lossless-coded regions within lossy-coded pictures." *We would
find it useful and supportive of accessible content to make this
information available as an RTCDegradationPreference.*

Lastly, how are these hints communicated? We note that MP4 files can
contain metadata as defined by the format standard, and in addition, can
contain Extensible Metadata Platform (XMP) metadata. (source::
https://en.wikipedia.org/wiki/MPEG-4_Part_14).

*REQUESTS*

Correct these two typos:

   - Abstract: change "make more informed decision" to either "make a more
   informed decision" or "make more informed decisions"
   - Section 2. change "they appear" to "it appears"

Add to the introduction that content hint attributes defined in this
specification will benefit consumers who rely on assistive technology (AT)
and personalization.

The WG to ensure that the specification covers use cases with support
files, and that hints can be provided for those files.

In section 4, ensure that hints support the use-cases mentioned above.

In section 5.2 ensure that the specification supports regions particularly
when such regions are important for accessibility.

*Thanks for the good discussion - L*


On Wed, Jun 2, 2021 at 12:02 PM Lionel Wolberger <lionel@userway.org> wrote:

> Hi all,
>
> I was asked to review the MediaStreamTrack Content Hints draft
> specification, at https://www.w3.org/TR/mst-content-hint/.
>
> This is an API specification that was created with a focus on the
> end-users' experience: "Adding a media-content hint provides a way for a
> web application to help track consumers make more informed decision[s]...."
> The specification did not seem to be aware that these consumers may be
> relying on AT.
>
> Below is my initial response.
>
> *The content hint attributes defined in the specification will benefit
> consumers who rely on assistive technology (AT) and personalization. *Content
> authors may author content hints with AT in mind. In addition, User
> Agents make an accessibility tree available to assistive technology via
> API.  In addition, User Agents are personalized to the needs of the user.
> These personalizations will continue to be extended in future, for example
> to support COGA, Personalization and other WG requirements. We expect a
> proliferation of such personalizations in future. Such personalizations
> will find content hints very relevant.
>
> In sum, content authors can author contentHint
> <https://t.sidekickopen90.com/s3t/c/5/f18dQhb0S7kF8cFFTBW4T_qld2zGCwVN8Jbw_8QsRtKVn1vXj1p1kknW16gGBN41Jd6G101?te=W3R5hFj4cm2zwW4mKLS-4mbkbhW49Ldrl308ybGW4fdgvc41YylgW4fdgXQ41YszVW3H90C_3_SMDQW3zh2Fq3K1LvHW49HR8w1Gy-qYW4fGC1K3R0JW00&si=8000000004174048&pi=58151f60-5af3-4f61-ebc9-364d322a7e5a>
>  with the experience of AT users in mind, or UAs acting on behalf of
> users.  The specification's introduction would be a good place to
> clarify this as a further benefit of content hints.
>
> *The specification make no mention of hints regarding support files *(captions,
> audio descriptions) that often accompany media content, either linked to it
> in HTML externally (using the <track> element) or furnished 'in-band',
> e.g., contained within the .MP4 wrapper (HasCaptions: T/F,
> HasAudioDescription: T/F). If either return True, THEN they need to be
> exposed in the UI: essentially as 'active' buttons in the Controls. Did the
> WG consider whether hints could also usefully convey whether the media
> content has such supporting files?
>
> Now we turn to some detailed feedback:
>
> *Regarding Section 4. The specification does not address audio and video
> formats that are often encountered with content that has been made
> accessible. *For clarity, I propose hints accordingly:
>
> *For Audio, an additional hint to indicate the presence of
> audio-description *(or some similar label as you find appropriate).
> Audio-description is audio that resembles speech-recognition, but does not
> contain data for the purpose of speech recognition by a machine.
> Audio-description is audio that resembles "speech" but it will likely not
> be appropriate to apply noise suppression or boost intelligibility of the
> incoming signal.
>
> In the language of the specification (4.1) , "A track with content hint "audio-description"
> should be treated as if it contains audio data, without background noise,
> describing in words the activity in the video."
>
>
> *For Video, an additional hint to indicate the presence of transcription
> embedded in the video*, e.g., motion-with-transcription (or some similar
> label as you find appropriate). motion-with-transcription would refer to a
> motion video that has, embedded, transcription data, either a
> picture-in-picture showing a sign language interpreter, or text captions
> embedded in the video.
>
> In the language of the specification (4.2): A content hint of
> motion-with-transcription should be treated such that one region of the
> video frame has details that are extra important, and in that region that
> significant sharp edges and areas of consistent color can occur frequently
> (the area with sign language interpretation, or the area with onscreen
> captioned text). This screen region would optimize for detail in the
> resulting individual frames rather than smooth playback. Artefacts from
> quantization or downscaling should be avoided.
>
>
> Regarding section 5. In 5.2 "Degradation preference when encoding," the
> specification addresses the choices encoders make, but no mention is made
> of regions. For example in AVC, "it is also possible to create truly
> lossless-coded regions within lossy-coded pictures." Picture regions may be
> very significant for accessibility. Consider a video with sign language
> interpretation embedded (e.g., in the upper right corner), or a video with
> captions embedded (e.g., in the bottom of the picture area). These regions
> would benefit from different encoding decisions than the rest of the frame. *We
> would find it useful and supportive of accessible content to make this
> information available as an RTCDegradationPreference.*
>
> Lastly, considering the User Agent and the DOM: Should this specification
> include an entreaty to the browser manufacturers to make the hint available
> in the DOM and API so as to pass on this content-hint to the accessibility
> tree?  If so, I would recommend a section something like, "*5.5 Behavior
> of the User Agent: The user agent MUST make the contentHint available in
> the DOM and the accessibility tree, to assure the contentHint is available
> to assistive technologies."*
>
> P.S. they should fix these two typos:
>
>
>    - Abstract: change "make more informed decision" to either "make a
>    more informed decision" or "make more informed decisions"
>    - Section 2. change "they appear" to "it appears"
>
>
> *Thanks to John F for his help and support. Some of the wording regarding
> support files is his. *
>
> Looking forward to today's meeting,
>
> Lionel
>
>
>
>
>
>
> Lionel Wolberger
> COO, UserWay Inc.
> lionel@userway.org
> UserWay.org <http://userway.org/>
>
> <https://t.sidekickopen90.com/s3t/c/5/f18dQhb0S7kF8cFFTBW4T_qld2zGCwVN8Jbw_8QsRtKVn1vXj1p1kknW16gGBN41Jd6G101?te=W3R5hFj4cm2zwW4hLZp04myBBCf43Wg2w04&si=8000000004174048&pi=58151f60-5af3-4f61-ebc9-364d322a7e5a>[image:
> text]
>

Received on Thursday, 17 June 2021 05:03:27 UTC