Re: [css-speech][css-content][mediaqueries] Making Generated Content Accessible from Reece Dunn on 2014-12-03 (www-style@w3.org from December 2014)

From: Reece Dunn <msclrhd@googlemail.com>
Date: Wed, 3 Dec 2014 15:18:19 +0000
To: Florian Rivoal <florian@rivoal.net>
Cc: Daniel Weck <daniel.weck@gmail.com>, James Craig <jcraig@apple.com>, fantasai <fantasai.lists@inkedblade.net>, Alan Stearns <stearns@adobe.com>, www-style list <www-style@w3.org>, "Tab Atkins Jr." <jackalmage@gmail.com>, fantasai <fantasai@inkedblade.net>
Message-ID: <CAGdtn27xD-jeA91WZr68EJupgb0wAW1AaDQ_ZKdgKhxQGZYtBw@mail.gmail.com>

On 3 December 2014 at 14:20, Florian Rivoal <florian@rivoal.net> wrote:
> On 03 Dec 2014, at 14:50, Daniel Weck <daniel.weck@gmail.com> wrote:
>>
>> On Wed, Dec 3, 2014 at 3:43 AM, James Craig <jcraig@apple.com> wrote:
>>>
>>>> This raises 2 (related) questions. Is the introduction of this media feature sufficient to deprecate the “speech" media type into never matching? If not, can and should the same privacy model be applied to it?
>>>
>>> My understanding is that the speech media type is *only* useful for linearized audio-only media not intended for the screen, since it is mutually exclusive with the screen media type. Most assistive technologies operate on some concept of a "screen" (including screen readers for the blind) so the speech media type should never apply to screen readers or ScreenMagnifier+Speech utilities, but its possible there is some use case. For example, if you were to turn an EPUB into a generated TTS audiobook, the speech media type could apply. I don't know if any implementations support that, but you'd probably want to check with someone from DAISY before making it a No-Op.
>>
>>
>> Hello,
>
> Hi, Thanks for the feedback, I was hoping you'd pop in.
>
>> Yes, from a content design perspective, the 'speech' Media Type can be
>> used to define a "complete aural alternative to a visual presentation"
>> (full quote below), and as per the specification: such representation
>> would be mutually exclusive to other media types, when "rendered"
>> within a *given* canvas. The same applies to 'braille' (for example),
>> although the "tactile" Media Group also includes the 'embossed' Media
>> Type (conversely, 'speech' stands on its own).
>
> The exclusive nature of media types has turned out to be an issue in almost all cases, which is why we're generally trying to deprecate them, and replace them by media features which capture the key aspect that made the media types different.
>
> Unlike a type like handheld for example, which was so similar to screen that browsers never matched it due to compat concerns, speech may be sufficiently different from screen an exclusive media type could work. At the same time, given the existence both of speech-UAs which only read the content out loud in a linear fashion (E-pub reader) *and* of speech UAs which do speech as an assistive complement to a visual 2d rendering, I am not so sure that this is really exclusive.

I like the idea of using features, as that would allow CSS writers
more control over the intent of what they want.

> What would you think (naming aside) about a media feature like this:
> speech: none | linear | screen-based

Aren't these independent concepts?

In an ebook reader, you can have 3 modes of speech:
1.  any audio/narration from the book itself (in ePub this is done
using a SMIL document which is liked to the HTML document by id
names);
2.  using text-to-speech (TTS) for reading the text in the HTML document;
3.  using (1) if present or (2) if not, for the current section.

In the ebook reader case, the TTS reading is where these CSS rules are
most likely to be applied. These would include the following use
cases:

1.  Providing hints to the TTS engine on how the text should be
spoken, including things like controlling numbered lists. This can be
done with existing CSS or more powerfully with upcoming modules (e.g.
the Counter Styles module). These can share styles with other media.
This also includes the speak property (the speech equivalent of
display) and say-as (a simplified version of the SSML
say-as/interpret-as to e.g. say that a number should be spoken as
digits).

2.  Controlling the audio produced. This is the styles affected by the
linear model -- the pause, rest and cue styles from CSS speech, as
well as the voice-* properties for controlling the TTS engine.

The rules in (1) relate to how the text should be spoken and are
applicable to both ebook readers and assistive technologies and may
have a different rendering to screen or other media. The rules in (2)
are only really applicable to ebook readers.

As such, I propose two media features [1]:

speech = none | tts
presentation = screen | narration

|speech=tts| is used for rules relating to (1), controlling how a
text-to-speech engine (either via an ebook reader or screen reader)
should interpret the content. |presentation=narration| is used for
rules relating to (2), controlling how the audio should be spoken when
read in a linear, narrative style.

Thus, you have:
1.  display media (screen, print, etc.) -- speech=none, presentation=screen
2.  screen reader -- speech=tts, presentation=screen
3.  ebook reader -- speech=tts, presentation=narration

[1] I would also be happy with something like |hinting = none | tts|,
but we can save the bikeshedding issues until the overall concept and
design is agreed on.

Thanks,
- Reece H. Dunn (Cainteoir Technologies) [http://reecedunn.co.uk]

Received on Wednesday, 3 December 2014 15:18:47 UTC