- From: <peter.b.l.meijer@philips.com>
- Date: Tue, 23 Nov 1999 12:50:36 +0100
- To: <w3c-wai-ig@w3.org>
I much appreciate the valid comments given by Rich, Steve and Bruce. Your constructive criticism and feedback is very useful, as it helps me to better understand where I should focus my attention. I'll try to go into some of your comments, and we will see to what extent we can find further common ground. I am aware that technology is not even half the story here, and we are rapidly approaching the point where we do have a powerful new accessibility option and tool from a technical perspective, but still need to find out and decide for ourselves what we can do or would want to do with it, if anything. Indeed there is a need to find better and more convincing answers to Bruce's legitimate questioning of any real world applications. Rich wrote > Some may argue that given enough training, this can become a viable > way of "seeing." I think the learning process would be very painful, > slow, and frustrating. What do you-all think? Yes, to really learn to see with sound could be painful, it will be very slow, and it might be frustrating at times. Moreover, we do not know yet how good people can get at it in the end. On the other hand, the process of learning a foreign language is also often painful, boring, very slow, and frustrating. I have no experience learning to read Braille, or learning to use the cane for safe travel, but I imagine that that too is by no means great fun until after you have mastered it to a certain degree. Dots of Braille do not make any sense to me, nor does the Spanish language. Somehow we or you work on mastering some of these things, because we think it pays off one way or the other. Will learning to see with sound pay off for you? I don't know, I cannot promise that. Depending on one's personal background, attitude, expectations or interests, it could also be fun for those who had no prior vision, getting an exciting "hands-on" exploration of vision by using a cheap PC camera, experiencing the effects of visual perspective, occlusion, parallax, visual texture, and so on and so forth, but it definitely won't be easy if you want to fully master the interpretation of arbitrary soundscapes. Would it be possible to view and treat it more like playing a game? The technology has only during the last two years become affordable, through the use of standard PC's and PC cameras, and with worldwide availability of software and information through the Internet. We don't have any convincing success stories from users to tell yet. The technology is there all right, it provably preserves a lot of visual information in the soundscapes while meeting several key parameters known to limit human auditory perception, it is technically reliable through the use of mass-produced hardware components, it is affordable through cheap $50 cameras and it provides unprecedented access to visual information, but all that information is certainly very dense, and presented in a way that no human being has ever had access to before in history. We don't know if the human brain, your brain, my brain, can learn to cope with that, or rather to what extent it can learn to do that, and whether it is really worth all the trouble. Now how do we proceed - if we do? Ideas are welcome. Steve wrote > I think this is a very innovative idea, but I too could not make > sense of much of anything beyond simple straight lines. I think > that an SVG or XML approach still provides the best means of getting > information such as what objects are on the screen and how they are > connected. You are quite right, Steve, from the perspective of accessing structured information on the screen. This technology was actually developed and meant for accessing arbitrary (unprepared, untagged) visual information, going well beyond the screen, specifically for gaining access to the visual information from our real-life local environment by using a camera. There is no XML describing my room, my house, my neighbourhood, or XML describing the architecture and art in the city of Rome. Now if (and indeed stressing the big "if") we can learn to interpret that arbitrary visual information through sound, we will as a bonus also be able to interpret whatever shows on the screen without additional tagging, just like sighted folks interpret images on web pages. For the moment, I can only demonstrate that many things are "audible" within soundscapes, not that they are "understandable". How could I show that the Chinese language makes sense and can be learnt? Unless you already know Chinese, you probably take it for granted, because many people apparently speak that language, but without such a priori historical evidence available, how does one go about proving things? I hope that Kynn will have some nice photographs of buildings or architecture, such that I can discuss how that translates into certain rhythms and sweeps for rows of gates, pillars and such plus the effect of visual perspective on that. Again, these soundscapes of complex scenes will currently not make sense to you without my explanation, but with an explanation, the various visual items should at least appear audible, thus illustrating the principles while hopefully adding an element of plausibility to the whole soundscape approach. For specific restricted environments such as the graphical user interface of the computer, dedicated solutions will always work much better and easier, just like O C R with synthetic speech for printed text is a lot faster and a lot less painful than trying to figure out printed text from the corresponding soundscapes of printed words. So in a sense my use of screen items to illustrate the soundscape technology may be a poor or perhaps even confusing choice, because I'm not proposing to use soundscapes for that, but merely wish to illustrate the generality of visual access offered by jumping into anything visual, even while better solutions do exist for a limited number of specific domains. Bruce wrote > I think the technology may have promise for real time use (where > the user is controlling the up/down left/right component), but > products that work that way (for navigating the real world) are > already available. What products are you referring to here? GPS systems? Electronic compasses? Talking signs? > By the time AI is sufficiently advanced to process these sounds > intelligibly, we would already have better automated pattern/graphic > recognition! Even if this became feasible, and machine vision is far away from understanding anything but the simplest of scenes in very restricted environments, there would still remain the fundamental problem of getting either machine censorship to decide for you what is relevant or interesting to mention, or else allow for five minutes to hear all items in a single scene listed. With soundscapes, you get the raw uncensored visual information of a scene in one second, today, but the big burden of interpretation is indeed on you, the user. After all this "heavy" stuff, it may be useful to note that there are some easy applications of the software as well. For instance, it can act as a cheap color identifier: pressing function key F10 lets the software speak the color name of anything in the center of the view, be it a camera image, an imported image file or an image from your TWAIN scanner. There is also a built-in accessible graphing calculator for function plotting under function key F8. Sorry for this long-winded reply. Not everything is hard... Best wishes, Peter Meijer Soundscapes from The vOICe - Seeing with your Ears! http://ourworld.compuserve.com/homepages/Peter_Meijer/winvoice.htm
Received on Tuesday, 23 November 1999 06:50:49 UTC