- From: Charles McCathieNevile <charles@sidar.org>
- Date: Mon, 12 Apr 2004 20:45:04 +1000
- To: inma <acbfabri@si.ehu.es>, Guido Gybels <Guido.Gybels@rnid.org.uk>, Mark Hoda <mark.hoda@rnid.org.uk>
- Cc: IG Group <w3c-wai-ig@w3.org>
I am sending this message on because it hasn't appeared in the archives, and I have never seen it come from the list. I'll respond to it separately, in pieces, but I would like to thank Guido for taking the time to write it. cheers Chaals From: Guido.Gybels@rnid.org.uk Subject: Beyond text captions Re: Deaf users, Date: 1 April 2004 18:23:15 GMT+10:00 To: charles@sidar.org Cc: mark.hoda@rnid.org.uk, acbfabri@si.ehu.es, w3c-wai-ig@w3.org Charles, Thanks for sharing your thoughts with us and apologies for the lateness of this reply, but I have been extremely busy lately. Before I address some of the specific issues you have raised, I need to point out that RNID already replied to WAI's draft WCAG 2 guidelines through the Office of the e-Envoy in September last year. You can see the text of our formal reply here: http://www.ictrnid.org.uk/docs/wcag2.pdf I hope this did not get lost in WAI and would appreciate it if you could give some more attention to these issues were required. You will see that our response already makes reference to avatars or sign language video streaming and other issues beyond captioning. <<For a community of deaf users who are not good readers, signing is their native language. Captioning is considered a nice idea, but not actually the preferred way, for many deaf people, of understanding what is happening.>> Sign language users form a minority of the overall group of deaf and hard of hearing people, but indeed they face quite specific challenges in dealing with web and other content precisely because of the fact that their first language is sign, not the spoken language of the land. So, in Britain, British Sign Language (BSL) is the first language for a group estimated to be about 50.000 users (the numbers themselves are cause for great controversy by the way). That means that English is for these people at best a second language. The problem is further aggravated because of the fact that BSL is not based on English, it is a quite different language and there is no one to one relationship in terms of semantics, grammar etc. between English and BSL. Some sign languages, like for example German sign language, are very closely related to the spoken language, making it easier to be fluent in both and to transcode between spoken and signed versions. For languages like BSL that is not the case and this complicates problems quite considerably. So, yes, sign language users can benefit from having signed content available to them as opposed to written English (or whatever language domain in question). Nevertheless, it doesn't mean that simply transcoding every multimedia content in sign language is in fact the best possible option. There are a number of problems: a) For written, rather static content, it might make sense to provide a signed version through the use of video clips or signing avatars. However, not all documents are equally suitable to be delivered in signed form. Train timetables for example, or schedules are not very effective in signed form. In other cases, like for long, legal documents, it can become very tedious to have to go through a long winded signed clip (making abstraction of practical problems like filesizes etc) and it might be actually better to simply provide a short signed resume of the key points with pointers to further relevant information where needed. b) Multimedia content can be complicated too: providing open or closed signing for a newsspot for example is very helpful to sign language users. Doing the same for a high action clip with little dialogue of an uncomplicated nature might be far less effective. In fact, you see that for things like sports or action movies, many sign language users even if given the choice will actually prefer the subtitles because it is less intrusive and distracts less from the action than the signing does. So, it is not all as black and white as some people purport it to be. We have already carried out quite a lot of research into this and are continuing to do so. One of our projects is in collaboration with the BBC and looks specifically at these issues of preferences and how genre, timings etc. impact on that. c) Since there is no simple way of automatically transcoding between written and signed content, providing signing either through avatars or via video clips requires specific work to create the signed content separately. Apart from the obvious challenges this poses in terms of content management and processes (keeping both versions synchronised for example). This is challenging enough for static content, becomes very problematic for dynamic content and extremely challenging for transactional systems. RNID has been involved in a number of projects addressing some of these issues, but it is quite clear that a great many problems remain unsolved and will require many years of hard work before we get even close to full availability and manageability of these forms of content in signed format. An interesting problem being also how to allow sign language users to input feedback for transaction handling back into the system... d) Sign language interpreters for the translation of written into signed content are a very scarce resource. In the UK for example, there are only about just over 200 registered interpreters available for the whole of the UK and for all the interpreting (face-to-face, media, video interpreting, relay, etc.). This means that the resource is very expensive as well. Producing signed content is far from a minor problem. That is precisely why RNID is involved in the development of avatars, which we see as a way to increase the provision of signed content: http://www.bbcworld.com/content/clickonline_archive_35_2002.asp? pageid=666&co_pageid=3 What it all boils down to is that information providers should not make any assumptions about people's preferences simply based on the fact that they are deaf or not. In the end, the choice of whether or not to use subtitles for example is up to the user. The provider therefore should offer the possibility to the user and leave it to them to decide what they want to use where. Offering in addition signed content as an alternative to written content is equally important, yet there are significant practical and organisational constraints for doing so. We will have to accept that for the foreseeable future, it is simply not going to be possible to offer signing everywhere and for all the massive amounts of written content that is available, let alone for dynamic or even transactional content. <<Work done in the Deaf Australia Online project and its successor looked at the actual technical requirements for making this workable>> We know very well that some research has been done on the subject of using video technology for remote signing and/or lipreading (in fact we are carrying out such research continuously here at RNID), but the fact remains that there are significant questions that remain. One of the intrinsic problems in using video technology is that these systems are not designed specifically to be carriers for BSL, but rather have been developed as generic tools for video encoding, compression and transport. That means that the specific feature set required for BSL might sometimes be compromised due to design decisions in favour of general usage. For example, common compression techniques used to compress the video stream into the bandwidth available will favour reproduction of broad movement over fine detail in order to keep the bitrate within the bandwidth limits. That often results in increased blurring and/or pixilation in favour of transmitting the broad movement itself at the highest available framerate. This might cause serious degradation in details like facial expression or individual finger shapes, essential for BSL. A lot of the work done focuses on very coarse measures like framerate or picture size to establish minimal requirements. However, in reality, such measures do not address many of the important aspects of successful sign language and/or lipspeaking communication. Some of these are: - Problems of temporal and spatial resolution: these impact on things like sufficient resolution for detailed aspects of finger spelling and lipmovements or eye-gaze. Even at high framerates and with large picture sizes, a suitable resolution to convey enough detail for these to be readable is problematic and no actual scientific data exists to establish a minimum set. This is further complicated by the fact that compression and encoding impact directly on this: to achieve for example a perceivable framerate effect of 20fps taking into account the realities of compression, round trip delays and decompression requires probably at least 25fps effective framerate. - In addition, compression schemes impact on effective resolution, a problem that has not been understood enough either. Pixelation and blurring impact on the legibility, and those are not only difficult to measure quantitatively, but they also are hugely variable. - Timing and synchronisation: framerates are only just part of the story in terms of timing. Round trip delays can impact on the real-time aspect of the conversation but even more important is the synchronisation issue. For lipreading for example, where most lipreaders actually combine audio and visual perception to understand what is being said, synchronisation has to be very tight. While this is less of an issue for profoundly deaf people that use lipreading to add to the understanding of the conversation, most lipreaders actually do use the audio as well and the synchronisation required comes very close to the boundaries of what is technically possible with even optimal network situations, let alone in realistic environments like standard long-distance ISDN or even broadband connections. - Dynamic resolution adjustment: shape-encoding like in mpeg-4 for instance would allow to overcome some of the problems raised before, yet we lack again relevant data to design services: specifically what is needed is a proper design for algorithms to provide this dynamic adjustment, based on factual understanding of how to define priority areas and shape forms and the effect all of this will have on signal processing and coding. - Lighting and viewing angles: while the importance of lighting is recognised by most, there is again very little quantitative information available and the interaction between the various lighting features and the other picture aspects like temporal and spatial resolution or encoding is not well understood in terms of impact on the signing or lipspeaking. - Safe areas in pictures and the impact of near-border distortion etc. These are just some of the outstanding issues. In addition, IP networks with their own topology and varying bandwidth designs pose their own set of problems and challenges, all of which we will need to understand properly. The bottom line is that the matter cannot be reduced to very crude requirements in terms of either bandwidth, picture size or framerate. In fact, this can be counter productive as some providers might get the impression that providing a CIF image at 25 fps is all it takes to deliver successful sign language content. Deaf people would lose out significantly if such perceptions would not be challenged. <<Another approach is the use of signing avatars - animated figures.>> Yes, as I said, we are well at the forefront of all this work, apart from the clip above, you might be interested in the following: http://www.sign-lang.uni-hamburg.de/eSIGN/Demo.html http://www.rnid.org.uk/html/information/technology/projects_synface.htm http://www.rnid.org.uk/html/information/technology/projects_esign.htm Signing avatars are being developed for quite a number of years now, and there are two global approaches to this: - Motion capture based systems whereby a motion capture database of signs or parts of signs is used to concatenate them into larger signed sequences; - Synthetic systems where the signing is generated entirely through animation techniques and does not rely on motion capture databases. The quality of synthetic signing, although improving all the time, is still far from being usable in real systems. Also, the problems of sign language notation are still not solved. Motion capture is laborious and requires expensive and time-consuming setups. We have also experimented with mixed models, combining MoCap with synthetic signing. In addition, we are working with organisations like the BBC to make MoCap less laborious and thus easier to use so that more content can be provided in less time and at a lesser cost. This email is getting already too long, so I'll leave it at this for now. I hope the information provided is useful and helps in better understanding some of the issues we are dealing with. In the next 6-12 months, RNID will be publishing a white paper on sign language provision using video technology and of course the avatar work is progressing as well, which should result in avatars being used for transactional content by the end of the year as well. Best wishes, Guido Guido Gybels Director of New Technologies RNID, 19-23 Featherstone Street London EC1Y 8SL, UK Tel +44(0)207 294 3713 Fax +44(0)207 296 8069 "The Royal National Institute for Deaf People (RNID) is the largest charity representing the 9 million deaf and hard of hearing people in the UK. As a membership charity, we aim to achieve a radically better quality of life for deaf and hard of hearing people. We do this by campaigning and lobbying vigorously, by raising awareness of deafness and hearing loss, by providing services and through social, medical and technical research." ************************************************************************ This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. Any views or opinions expressed are solely those of the author and do not necessarily represent RNID policy. If you are not the intended recipient you are advised that any use, dissemination, forwarding, printing or copying of this email is strictly prohibited. If you have received this email in error please notify the RNID Helpdesk by telephone on: +44 (0) 207 296 8282. The Royal National Institute for Deaf People Registered Office 19-23 Featherstone Street London EC1Y 8SL No. 454169 (England) Registered Charity No. 207720 ************************************************************************
Received on Monday, 12 April 2004 06:48:29 UTC