- From: Reece Dunn <msclrhd@googlemail.com>
- Date: Sat, 27 Jun 2015 15:24:03 +0100
- To: fantasai <fantasai.lists@inkedblade.net>
- Cc: "www-style@w3.org" <www-style@w3.org>, Bo J Campbell <bcampbell@us.ibm.com>
On 27 June 2015 at 13:34, fantasai <fantasai.lists@inkedblade.net> wrote: > While I think the CSS Speech module defines a really cool processing > model for speech rendering of a document, we don't have much in the > way of implementations. Also I suspect that a good speech stylesheet-- > one that enhanced, rather than interfered with, the speech user > experience--would be hard to create without a better understanding of > the "default UA stylesheet" and a fair amount of specialized training, > so would be beyond the capabilities of most authors. > > However, I think the 'speak' and 'speak-as' properties would be very > useful to have in the general authoring toolkit. The 'speak' property > in particular allows speech rendering to have different hiding/showing > of content than visual layout, without any weird hacks. So I'm thinking > maybe we should split CSS Speech into two levels: > > Level 1: 'speak' and 'speak-as' > Level 2: Everything currently in the spec. > > This might encourage implementation of 'speak' and 'speak-as' in > browsers. > > Thoughts? Hi, I am an implementer of a Text-to-Speech program (https://github.com/rhdunn/cainteoir-engine) that makes limited use of CSS, currently to apply a basic content rendering model. As such, I am interested in this proposal. Here are my thoughts: # thoughts on implementability If the intention for this is to have a Web Browser (or narration software) control a Text-to-Speech engine (or allow that as a valid implementation of the specification), the control will be limited to what the engine exposes. This will be: 1. Voice selection, which can be used in implementing the `voice-family` property. 2. Voice parameters, which can be used to implement the `voice-rate`, `voice-pitch`, `voice-volume` and `voice-range` properties. 3. SSML markup, which can be used to implement the other features. In addition, control of the audio output can be used to implement the `voice-balance` property and the aural box model (pause, rest, cue). The engine's functionality (e.g. SSML support) would affect the ability to implement the different features, especially things like explicit voice pitches. For text-to-speech engines adding CSS Speech support, I can see complexities between the CSS Speech model and the SSML model. # speak I don't see why `speak=none` is broken out from `speak-as` (esp. if this is seen as analogous to `display`). I am not sure about `speak=none` being overridden in descendents -- what are the use cases for this behaviour (esp. compared to the `display=none` behaviour)? I am not sure about the interaction with `display=none` -- I wonder if this is best having `display=none` take precedence aurally (otherwise, you could have speak=normal on head, script and style HTML elements). Are there any use cases where speak=none is useful compared to what is displayed on the screen? Is this intended for things like navigation/menus? If so, how would a blind person know the menu is there? # speak-as `spell-out` and `digits` are effectively the same. The only difference is one is applied to words and the other to numbers. Note that the user can only specify one of these at a time, so if applied to "hello 123" you cannot get both features at the same time. Is spelling out the "rĂ´le" example as "R O circumflex L E" also conforming? `spell-out=literal-punctuation` implies that the punctuation is not used for pauses, but shouldn't it still be used for pauses as well (to avoid very long run on utterances when applied over large text). # SSML say-as compatibility Although the note on `speak-as` says that the CSS model is limited to a basic set of pronunciation rules compared to the SSML say-as property, it also adds more complexity. Specifically: `spell-out`, `digits` and `literal-punctuation` are all aspects of `<say-as interpret-as="characters" format="characters">...`. `literal-punctuation` and `no-punctuation` specify the removal of pauses from punctuation. This is orthogonal to how the characters are pronounced. Thus, an SSML-compatible model could be: ## speak = [default | none | characters] speak=default -- Use the default aural rendering of the text by the Text-to-Speech engine. speak=none -- Don't speak the text. speak=characters -- Speak out letters, digits and punctuation individually (same as interpret-as=characters with format=characters). NOTE: This can be extended in the future to support more modes (=glyphs, =date, =time, =telephone, =cardinal, =ordinal, etc.). Bikeshedding: This is intended to be analogous to `display` (which is not `display-as`), but could be changed to `speak-as` if required. ## punctuation-pause = [default | none] punctuation-pause=default -- Let the Text-to-Speech engine determine how long to pause after punctuation. punctuation-pause=none -- Don't pause when encountering punctuation characters. Bikeshedding: This could also be something like `punctuation-break`. Thanks, - Reece H. Dunn
Received on Saturday, 27 June 2015 14:24:30 UTC