RE: Critical missing feature in SSML specification from T. V. Raman on 2003-01-30 (www-voice@w3.org from January to March 2003)

From: T. V. Raman <tvraman@us.ibm.com>
Date: Wed, 29 Jan 2003 16:31:31 -0800
To: "Shires, Glen" <glen.shires@intel.com>
Cc: David Poehlman <poehlman1@comcast.net>, Richard Schwerdtfeger <schwer@us.ibm.com>, www-voice@w3.org, w3c-wai-pf@w3.org
Message-ID: <15928.29283.238223.801297@bubbles.almaden.ibm.com>
The mark tag is designed to return the index mark as things get
spoken.
As others have pointed out stop itself does not belong in SSML;
rather, an app that is generating a stream of SSML should insert mark
commands at appropriate points so that when it issues a stop command
to the engine to which it has previously sent an SSML stream,
it can then know how far speech has progressed based on its tracking
of the returned index marks.
>>>>> "Shires," == Shires, Glen <glen.shires@intel.com> writes:


    Shires,> David, If I understand your view on this, the "voice
    Shires,> browser in this instance" would use something like DOM to
    Shires,> manipulate the SSML. If so, I would think it would be
    Shires,> difficult to know precisely where in the SSML document to
    Shires,> insert the <STOP> tag because one would need to know
    Shires,> exactly which point in the SSML document the renderer
    Shires,> currently processing. While <MARK> can coarsely help with
    Shires,> this, I envision numerous complexities in terms of
    Shires,> pipeline-buffers, latency and race conditions. I would
    Shires,> think implementation would be vastly easier and more
    Shires,> robust if a "stop" command (e.g. from a scripted object)
    Shires,> was simply sent to the TTS-engine/renderer (as opposed to
    Shires,> attempting to dynamically insert a markup tag at the
    Shires,> proper position in the markup).

    Shires,> Thanks, Glen Shires Intel Corporation


    Shires,> -----Original Message----- From: David Poehlman
    Shires,> [mailto:poehlman1@comcast.net] Sent: Wednesday, January
    Shires,> 29, 2003 10:47 AM To: Shires, Glen; www-voice@w3.org Cc:
    Shires,> w3c-wai-pf@w3.org Subject: Re: Critical missing feature
    Shires,> in SSML specification


    Shires,> I view ssl mark up in the same way that I view html or
    Shires,> xml mark up.  The user agent retrieves it and from there
    Shires,> it is under user agent controll.  The voice browser in
    Shires,> this instance would have to have the capability of
    Shires,> manipulating the mark up in the same way s other agents
    Shires,> manipulate html or xml.  While I understand a requirement
    Shires,> for a full stop, it must be in post get since it could
    Shires,> most likely be of no benefit in pre-get or in the data
    Shires,> set.  In the case of streaming, it is still a function of
    Shires,> another layer which exercises controll.  I would
    Shires,> encourage that this idea be kept but enforced in a
    Shires,> context where it can have effect.

    Shires,> ----- Original Message ----- From: "Shires, Glen"
    Shires,> <glen.shires@intel.com> To: <www-voice@w3.org> Cc:
    Shires,> <w3c-wai-pf@w3.org> Sent: Wednesday, January 29, 2003
    Shires,> 1:25 PM Subject: RE: Critical missing feature in SSML
    Shires,> specification



    Shires,> Richard, I understand why the scenario you describe
    Shires,> requires a "stop" command. I do not understand how a
    Shires,> <STOP> markup tag would fulfill these requirements. It
    Shires,> seems to me that the SSML markup would be already
    Shires,> generated and in process of being spoken by the TTS
    Shires,> engine when an event that initiates the "stop" command
    Shires,> occurs. I can envision how a scripted object might
    Shires,> accomplish this, but not how a <STOP> markup tag would do
    Shires,> so.

    Shires,> Perhaps you could explain.

    Shires,> Thanks, Glen Shires Intel Corporation


    Shires,> -----Original Message----- From: Richard Schwerdtfeger
    Shires,> [mailto:schwer@us.ibm.com] Sent: Wednesday, January 29,
    Shires,> 2003 9:37 AM To: www-voice@w3.org Cc: w3c-wai-pf@w3.org
    Shires,> Subject: Critical missing feature in SSML specification
    Shires,> Importance: High







    Shires,> In reviewing the SSML specification we (PF Group)
    Shires,> overlooked an extremely critical missing feature in the
    Shires,> last call draft.

    Shires,> It is absolutely essential that SSML support a <STOP>
    Shires,> command.

    Shires,> Scenario:

    Shires,> Screen reader users will often hit the stop command to
    Shires,> tell the speech synthesizer to stop speaking. Screen
    Shires,> Readers would use the <MARK> annotation as a way to have
    Shires,> the speech engine tell the screen reader when speech has
    Shires,> been processed (marker processed). In the event that the
    Shires,> user tells the screen reader to stop speaking the screen
    Shires,> reader should be able to send a stop command to the
    Shires,> speech engine which would utltimately flush the speech
    Shires,> buffers. Markers not returned would help the screen
    Shires,> reader know where the user left off in the user interface
    Shires,> (maintain point of regard relative to what has been
    Shires,> spoken).

    Shires,> I apologize for not submitting this in our last call
    Shires,> review but this is a hard requirement. Otherwise, we SSML
    Shires,> cannot support screen readers.

    Shires,> Rich

    Shires,> Rich Schwerdtfeger STSM, Software Group Accessibility
    Shires,> Strategist Emerging Internet Technologies Chair, IBM
    Shires,> Accessibility Architecture Review Board
    Shires,> schwer@us.ibm.com, Phone: 512-838-4593,T/L: 678-4593

    Shires,> "Two roads diverged in a wood, and I - I took the one
    Shires,> less traveled by, and that has made all the difference.",
    Shires,> Frost

-- 
Best Regards,
--raman
------------------------------------------------------------
T. V. Raman:  PhD (Cornell University)
IBM Research: Human Language Technologies
Architect:    Conversational And Multimodal WWW Standards
Phone:        1 (408) 927 2608   T-Line 457-2608
Fax:        1 (408) 927 3012     Cell: 1 650 799 5724
Email:        tvraman@us.ibm.com
WWW:      http://www.cs.cornell.edu/home/raman
AIM:      TVRaman
PGP:          http://emacspeak.sf.net/raman.asc
Snail:        IBM Almaden Research Center,
              650 Harry Road
              San Jose 95120
Received on Wednesday, 29 January 2003 19:32:05 UTC