Feedback on the TTS specification (minor nitpicks)

As explained in another email before, here is the list of minor
nitpicks that we have for the TTS part of the specification. These are
so small, you may consider including them even now before finalizing
the report.

Best Regards,
Silvia.


Minor nitpicks:

1. "A TTS element represents a synthesized audio stream. A TTS element
is a media element whose media data is ostensibly synthesized audio
data."

(A,S): Why use the word "ostensibly" in here when there is no other
alternative? Just remove that word.


2."Content may be provided inside the TTS element. User agents should not show
[(A,S) add: …or speak…]
this content to the user; it is intended for older Web browsers which
do not support TTS."


3. "In particular, this content is not intended to address
accessibility concerns. To make TTS content accessible to those with
physical or cognitive disabilities, authors are expected to provide
alternative media streams and/or to embed accessibility aids (such as
transcriptions) into their media streams."

(S): What is a transcription of a SSML document? Why would content
that is already provided as text need special accessibility support?
(Cut and paste error from HTML spec?) Just remove this paragraph.


4. "Implementations should support at least UTF-8 encoded text/plain
and application/ssml+xml (both SSML 1.0 and 1.1 should be supported)."

(A): Earlier it says SSML 1.1 is mandatory (while here it is
"should"): see section 6, requirement 13. Earlier it also doesn't
mention SSML 1.0.


5. Speech Enabled Email Client Example:

tts = new TTS();
function onMicClicked() {
         // stop TTS if there is any...
         if (tts.paused == false && tts.ended == false) {
             tts.controller.pause();

[(S): The controller is used only with mediagroups - I don't think you
want that in here. Just use tts.pause();]

             }
             sr.start();
         }
}
function readout(message) {
         tts.src = "data:,message from " + message.sendername + ".
Message content: " + message.body;

[(S): There is no mime type on this data URI. Should be data:text/plain,….]

         tts.play();
}


6. Speech XG Translating Example:

function readResult(event) {
     document.getElementsById("p").value = 100;
     var tts = new TTS();
     tts.serviceURI = "ws://example.org/speak";
     tts.lang = document.getElementById("outlang").value;
     tts.src = "data:text/plain;" +
translate(event.result.item(0).interpretation,
document.getElementById("inlang").value, tts.lang);
}

(A): Instead of tts.src, why not just set
tts.text = translate(event.result.item(0).interpretation,
document.getElementById("inlang").value, tts.lang);
?


7. "7.2.4.1 Generic Headers
…synthesisers will accept "application/ssml+xml"… "

(S): earlier it says that synthesisers will also accept text/plain


8. "Audio-Codec   = "Audio-Codec:" mime-media-type ; See [RFC3555]"

(S): The protocol references RTP mime types for the audio codec. Since
the data is, however, not transferred via RTP, it makes a lot more
sense to reference normal file mime types and not streaming
(un-encapsulated) data types.


9. "A.2.2.3 Web Authoring Convenience Synthesis Requirements"

FPR13. It should be easy to assign recognition results to a single input field.
This is a part of the expansion of requirement 3.

FPR14. It should not be required to fill an input field every time
there is a recognition result.
This is a part of the expansion of requirement 3.

FPR15. It should be possible to use recognition results to multiple
input fields.

(S): How are these "synthesis" requirements? Why not move them to the
recognition section?


10. (A): Use case links in A.3 Initial Requirements don't work.
(A): A few of the requirements had no use cases.

Received on Sunday, 4 December 2011 00:21:12 UTC