RE: ETSI standards and guides referred to in today's COGA TF call

Hi Debbie

I agree almost entirely with all your very good points. I also don't think that simply supporting the set of voice commands in ES 202 076 would not be a best-effort voice interface for large user-base public access online systems. Supplementing this baseline set with additional words that occur in user testing would definitely result in a much more effective system.

One of the drivers for developing the ETSI voice command set was the needs of mobile phone providers in Europe. Here they have to launch products that have to work for customers who could speak any one of the 30 languages commonly used in Europe. It would be unrealistic to expect that phone providers would carry out user tests with appropriately large groups of users who speak each of these languages. These phone providers needed a set of effective voice commands that could be used to allow various phone functions to be voice activated. I know that some phone providers have implemented the ETSI voice commands, but I have no idea how widespread the adoption has been.

So, although the expectation was that the ETSI voice commands could have widespread usage, it is in the context of trying to develop voice interfaces for products that had to work in very many languages that such a standardised command set could be seen as vital.

Best regards

Mike

From: Deborah Dahl [mailto:dahl@conversational-technologies.com]
Sent: 26 April 2016 16:57
To: Michael Pluke <Mike.Pluke@castle-consult.com>; 'public-cognitive-a11y-tf' <public-cognitive-a11y-tf@w3.org>
Subject: RE: ETSI standards and guides referred to in today's COGA TF call

Hi Mike,
I didn't know about ETSI ETR 096, there are some very good and still relevant ideas there (always provide help, describe a function before the digit that invokes it, etc.), despite the fact that the document is from 1993. Thanks for finding this. We should refer to this document in the voice issue paper.
I agree that there's no harm in recommending that systems implement  ETSI ES 202 076, or at least the basic and digit subsets. However,  I don't think that's sufficient. In many cases, like the ones you point out, I'm sure users will be able to guess the correct command. I still maintain that no one will consciously set out to learn them, especially not all 74 of them.  In real systems, it's much more user-friendly to implement the responses that show up in actual user testing.  These could, of course, be added to the ETSI ones (which only covered 85% of the responses they received in their tests). So then, when someone says "that's correct" instead of "yes", "quit" instead of "stop" or "end call" instead of "exit" they will be understood.
I was unable to find any studies validating the ETSI command set in real applications, which would be very interesting to see. However, in my experience, in real, large scale, applications there will always legitimate user inputs that the developers didn't expect and they need to be accommodated in user-friendly systems.
So, basically, in my opinion, there's no harm in asking developers to implement at least the basic commands, but it is unrealistic to think that this alone would automatically result in usable systems, especially for users with cognitive disabilities. I think it's much more important to provide access to human backup.
I will also check with some professional voice user interface designers and see what their experience has been with this standard.
Best,
Debbie

From: Michael Pluke [mailto:Mike.Pluke@castle-consult.com]
Sent: Monday, April 25, 2016 7:39 PM
To: public-cognitive-a11y-tf
Subject: ETSI standards and guides referred to in today's COGA TF call

After much hunting through fading memory cells I managed to locate the relevant ETSI document that I referred to in today's COGA TF call. It is:

ETSI ETR 096 "Human Factors (HF); Phone Based Interfaces (PBI)
Human factors guidelines for the design of minimum phone based user interface to computer services: http://www.etsi.org/deliver/etsi_etr/001_099/096/01_60/etr_096e01p.pdf .

It is only a Technical Report and not a standard, but it was developed with the involvement of the principle North American provider of such service in those (long ago) days. It does not in general associate digits with functions, but it does identify "0" as the preferred way to reach an operator and this is widely, but by no means universally, implemented.

The other document that lists potential voice commands in multiple languages, that I think Debbie is already familiar with, is:

ETSI ES 202 076 "Human Factors (HF); User Interfaces; Generic spoken command vocabulary for ICT devices and services": http://www.etsi.org/deliver/etsi_es/202000_202099/202076/02.01.01_60/es_202076v020101p.pdf

I've always been a little puzzled by Debbie's suggestion that ES 202 076 contains a mass of commands that users won't be able to learn. The commands were the words that large samples of people said they would use to elicit a particular function (for each of the 30 languages covered in the standard). The user requirements section said that "a spoken command vocabulary should be intuitive, easy to learn, memorable, natural, and unambiguous" and this set (which I was not involved in developing) seems to me to largely meet that goal.

So we have such difficult to remember commands as "yes" (or alternatively "confirm"), "no", the digits 0 to 9 (with, for example, the alternatives of "zero" or "oh" being acceptable English commands for 0); "record" to record something, "stop" to stop something, "start" to start something, "help" if you are after help,  "goodbye" or "exit" to exit a service (hanging up the phone also works here :)), etc. These seem to be the things that most people would naturally say first and every time, but even if they perversely said something else they would probably say these commands on subsequent attempts.

There are some commands for telephony functions that I suspect might be problematic, but that is as much because most people have no idea how telephone networks work and therefore do not understand the underlying concepts (like diverting and forwarding functions) that are translated into commands. I feel that there are a few other commands that might be less intuitive - so these might have to be learnt, but I feel that these are only a small minority.

This standard is now seven years old and modern systems like Siri, Cortana and Google Now offer a much more robust understanding of user input. However I'd be pretty certain that these systems work best when they hear clear and unambiguous commands like those in ES 202 076 and it would do no harm to require all systems to recognise and appropriately respond to these commands in the way described in ES 202 076!

Best regards

Mike
________________________________

Received on Tuesday, 26 April 2016 22:11:14 UTC