Re: Beyond text captions Re: Deaf users, from Charles McCathieNevile on 2004-04-12 (w3c-wai-ig@w3.org from April to June 2004)

From: Charles McCathieNevile <charles@sidar.org>
Date: Mon, 12 Apr 2004 20:45:04 +1000
To: inma <acbfabri@si.ehu.es>, Guido Gybels <Guido.Gybels@rnid.org.uk>, Mark Hoda <mark.hoda@rnid.org.uk>
Cc: IG Group <w3c-wai-ig@w3.org>
Message-Id: <7536C5CC-8C6E-11D8-B291-000A958826AA@sidar.org>
I am sending this message on because it hasn't appeared in the  
archives, and I have never seen it come from the list. I'll respond to  
it separately, in pieces, but I would like to thank Guido for taking  
the time to write it.

cheers

Chaals

	From: 	  Guido.Gybels@rnid.org.uk
	Subject: 	Beyond text captions Re: Deaf users,
	Date: 	1 April 2004 18:23:15 GMT+10:00
	To: 	  charles@sidar.org
	Cc: 	  mark.hoda@rnid.org.uk, acbfabri@si.ehu.es, w3c-wai-ig@w3.org

Charles,

Thanks for sharing your thoughts with us and apologies for the lateness  
of this reply, but I have been extremely busy lately.

Before I address some of the specific issues you have raised, I need to  
point out that RNID already replied to WAI's draft WCAG 2 guidelines  
through the Office of the e-Envoy in September last year. You can see  
the text of our formal reply here:

http://www.ictrnid.org.uk/docs/wcag2.pdf

I hope this did not get lost in WAI and would appreciate it if you  
could give some more attention to these issues were required. You will  
see that our response already makes reference to avatars or sign  
language video streaming and other issues beyond captioning.

<<For a community of deaf users who are not good readers, signing is   
their native language. Captioning is considered a nice idea, but not   
actually the preferred way, for many deaf people, of understanding what
is happening.>>

Sign language users form a minority of the overall group of deaf and  
hard of hearing people, but indeed they face quite specific challenges  
in dealing with web and other content precisely because of the fact  
that their first language is sign, not the spoken language of the land.  
So, in Britain, British Sign Language (BSL) is the first language for a  
group estimated to be about 50.000 users (the numbers themselves are  
cause for great controversy by the way). That means that English is for  
these people at best a second language. The problem is further  
aggravated because of the fact that BSL is not based on English, it is  
a quite different language and there is no one to one relationship in  
terms of semantics, grammar etc. between English and BSL. Some sign  
languages, like for example German sign language, are very closely  
related to the spoken language, making it easier to be fluent in both  
and to transcode between spoken and signed versions. For languages like  
BSL that is not the case and this complicates problems quite  
considerably.

So, yes, sign language users can benefit from having signed content  
available to them as opposed to written English (or whatever language  
domain in question). Nevertheless, it doesn't mean that simply  
transcoding every multimedia content in sign language is in fact the  
best possible option. There are a number of problems:

a) For written, rather static content, it might make sense to provide a  
signed version through the use of video clips or signing avatars.  
However, not all documents are equally suitable to be delivered in  
signed form. Train timetables for example, or schedules are not very  
effective in signed form. In other cases, like for long, legal  
documents, it can become very tedious to have to go through a long  
winded signed clip (making abstraction of practical problems like  
filesizes etc) and it might be actually better to simply provide a  
short signed resume of the key points with pointers to further relevant  
information where needed.
b) Multimedia content can be complicated too: providing open or closed  
signing for a newsspot for example is very helpful to sign language  
users. Doing the same for a high action clip with little dialogue of an  
uncomplicated nature might be far less effective. In fact, you see that  
for things like sports or action movies, many sign language users even  
if given the choice will actually prefer the subtitles because it is  
less intrusive and distracts less from the action than the signing  
does. So, it is not all as black and white as some people purport it to  
be. We have already carried out quite a lot of research into this and  
are continuing to do so. One of our projects is in collaboration with  
the BBC and looks specifically at these issues of preferences and how  
genre, timings etc. impact on that.
c) Since there is no simple way of automatically transcoding between  
written and signed content, providing signing either through avatars or  
via video clips requires specific work to create the signed content  
separately. Apart from the obvious challenges this poses in terms of  
content management and processes (keeping both versions synchronised  
for example). This is challenging enough for static content, becomes  
very problematic for dynamic content and extremely challenging for  
transactional systems. RNID has been involved in a number of projects  
addressing some of these issues, but it is quite clear that a great  
many problems remain unsolved and will require many years of hard work  
before we get even close to full availability and manageability of  
these forms of content in signed format. An interesting problem being  
also how to allow sign language users to input feedback for transaction  
handling back into the system...
d) Sign language interpreters for the translation of written into  
signed content are a very scarce resource. In the UK for example, there  
are only about just over 200 registered interpreters available for the  
whole of the UK and for all the interpreting (face-to-face, media,  
video interpreting, relay, etc.). This means that the resource is very  
expensive as well. Producing signed content is far from a minor  
problem. That is precisely why RNID is involved in the development of  
avatars, which we see as a way to increase the provision of signed  
content:

http://www.bbcworld.com/content/clickonline_archive_35_2002.asp? 
pageid=666&co_pageid=3

What it all boils down to is that information providers should not make  
any assumptions about people's preferences simply based on the fact  
that they are deaf or not. In the end, the choice of whether or not to  
use subtitles for example is up to the user. The provider therefore  
should offer the possibility to the user and leave it to them to decide  
what they want to use where. Offering in addition signed content as an  
alternative to written content is equally important, yet there are  
significant practical and organisational constraints for doing so. We  
will have to accept that for the foreseeable future, it is simply not  
going to be possible to offer signing everywhere and for all the  
massive amounts of written content that is available, let alone for  
dynamic or even transactional content.

<<Work done in the Deaf Australia Online  project and its successor  
looked at the actual technical requirements
for making this workable>>

We know very well that some research has been done on the subject of  
using video technology for remote signing and/or lipreading (in fact we  
are carrying out such research continuously here at RNID), but the fact  
remains that there are significant questions that remain.

One of the intrinsic problems in using video technology is that these  
systems are not designed specifically to be carriers for BSL, but  
rather have been developed as generic tools for video encoding,  
compression and transport. That means that the specific feature set  
required for BSL might sometimes be compromised due to design decisions  
in favour of general usage. For example, common compression techniques  
used to compress the video stream into the bandwidth available will  
favour reproduction of broad movement over fine detail in order to keep  
the bitrate within the bandwidth limits. That often results in  
increased blurring and/or pixilation in favour of transmitting the  
broad movement itself at the highest available framerate. This might  
cause serious degradation in details like facial expression or  
individual finger shapes, essential for BSL.

A lot of the work done focuses on very coarse measures like framerate  
or picture size to establish minimal requirements. However, in reality,  
such measures do not address many of the important aspects of  
successful sign language and/or lipspeaking communication. Some of  
these are:
- Problems of temporal and spatial resolution: these impact on things  
like sufficient resolution for detailed aspects of finger spelling and  
lipmovements or eye-gaze. Even at high framerates and with large  
picture sizes, a suitable resolution to convey enough detail for these  
to be readable is problematic and no actual scientific data exists to  
establish a minimum set. This is further complicated by the fact that  
compression and encoding impact directly on this: to achieve for  
example a perceivable framerate effect of 20fps taking into account the  
realities of compression, round trip delays and decompression requires  
probably at least 25fps effective framerate.
- In addition, compression schemes impact on effective resolution, a  
problem that has not been understood enough either. Pixelation and  
blurring impact on the legibility, and those are not only difficult to  
measure quantitatively, but they also are hugely variable.
- Timing and synchronisation: framerates are only just part of the  
story in terms of timing. Round trip delays can impact on the real-time  
aspect of the conversation but even more important is the  
synchronisation issue. For lipreading for example, where most  
lipreaders actually combine audio and visual perception to understand  
what is being said, synchronisation has to be very tight. While this is  
less of an issue for profoundly deaf people that use lipreading to add  
to the understanding of the conversation, most lipreaders actually do  
use the audio as well and the synchronisation required comes very close  
to the boundaries of what is technically possible with even optimal  
network situations, let alone in realistic environments like standard  
long-distance ISDN or even broadband connections.
- Dynamic resolution adjustment: shape-encoding like in mpeg-4 for  
instance would allow to overcome some of the problems raised before,  
yet we lack again relevant data to design services: specifically what  
is needed is a proper design for algorithms to provide this dynamic  
adjustment, based on factual understanding of how to define priority  
areas and shape forms and the effect all of this will have on signal  
processing and coding.
- Lighting and viewing angles: while the importance of lighting is  
recognised by most, there is again very little quantitative information  
available and the interaction between the various lighting features and  
the other picture aspects like temporal and spatial resolution or  
encoding is not well understood in terms of impact on the signing or  
lipspeaking.
- Safe areas in pictures and the impact of near-border distortion etc.

These are just some of the outstanding issues. In addition, IP networks  
with their own topology and varying bandwidth designs pose their own  
set of problems and challenges, all of which we will need to understand  
properly. The bottom line is that the matter cannot be reduced to very  
crude requirements in terms of either bandwidth, picture size or  
framerate. In fact, this can be counter productive as some providers  
might get the impression that providing a CIF image at 25 fps is all it  
takes to deliver successful sign language content. Deaf people would  
lose out significantly if such perceptions would not be challenged.

<<Another approach is the use of signing avatars - animated figures.>>

Yes, as I said, we are well at the forefront of all this work, apart  
from the clip above, you might be interested in the following:

http://www.sign-lang.uni-hamburg.de/eSIGN/Demo.html
http://www.rnid.org.uk/html/information/technology/projects_synface.htm
http://www.rnid.org.uk/html/information/technology/projects_esign.htm

Signing avatars are being developed for quite a number of years now,  
and there are two global approaches to this:
- Motion capture based systems whereby a motion capture database of  
signs or parts of signs is used to concatenate them into larger signed  
sequences;
- Synthetic systems where the signing is generated entirely through  
animation techniques and does not rely on motion capture databases.
The quality of synthetic signing, although improving all the time, is  
still far from being usable in real systems. Also, the problems of sign  
language notation are still not solved. Motion capture is laborious and  
requires expensive and time-consuming setups.
We have also experimented with mixed models, combining MoCap with  
synthetic signing. In addition, we are working with organisations like  
the BBC to make MoCap less laborious and thus easier to use so that  
more content can be provided in less time and at a lesser cost.

This email is getting already too long, so I'll leave it at this for  
now. I hope the information provided is useful and helps in better  
understanding some of the issues we are dealing with. In the next 6-12  
months, RNID will be publishing a white paper on sign language  
provision using video technology and of course the avatar work is  
progressing as well, which should result in avatars being used for  
transactional content by the end of the year as well.

Best wishes,

Guido

Guido Gybels
Director of New Technologies

RNID, 19-23 Featherstone Street
London EC1Y 8SL, UK
Tel +44(0)207 294 3713
Fax +44(0)207 296 8069

"The Royal National Institute for Deaf People (RNID) is the largest  
charity representing the 9 million deaf and hard of hearing people in  
the UK. As a membership charity, we aim to achieve a radically better  
quality of life for deaf and hard of hearing people. We do this by  
campaigning and lobbying vigorously, by raising awareness of deafness  
and hearing loss, by providing services and through social, medical and  
technical research."




************************************************************************
This email and any files transmitted with it are confidential
and intended solely for the use of the individual or entity to
whom they are addressed. Any views or opinions expressed
are solely those of the author and do not necessarily represent
RNID policy.

If you are not the intended recipient you are advised that any
use, dissemination, forwarding, printing or copying of this
email is strictly prohibited.

If you have received this email in error please notify the RNID
Helpdesk by telephone on: +44 (0) 207 296 8282.

The Royal National Institute for Deaf People
Registered Office 19-23 Featherstone Street
London EC1Y 8SL No. 454169 (England)
Registered Charity No. 207720
************************************************************************
Received on Monday, 12 April 2004 06:48:29 UTC