W3C home > Mailing lists > Public > w3c-wai-ig@w3.org > October to December 1999

RE: Practice Describing Pictures, anyone game?

From: Rich Caloggero <rich@accessexpressed.net>
Date: Mon, 22 Nov 1999 13:15:02 -0500
Message-ID: <01BF34EB.96A1AC80.rich@accessexpressed.net>
To: "'wai list'" <w3c-wai-ig@w3.org>
Has anyone played with these "sound scapes" Peter talks about here? I've 
tried listening to a few from his web page, but having never seen (not even 
light), I have a hard time making any kind of sense out of anything but the 
simplest of "images" (one straight line). How useful is this to those of 
you with limited vision, or with no vision but prior visual experiene? Some 
may argue that given enough training, this can become a viable way of 
"seeing." I think the learning process would be very painful, slow, and 
frustrating. What do you-all think?


> Well, I am sighted, and until you put these pictures on the web,
> we can take one of your existing photographs. I took the liberty
> of picking your nice Kynn-Cam image tongue2.jpg, available from
> your website at
>    http://www.kynn.com/quickcam/archives/tongue2.jpg
> and using The vOICe Learning Edition software I turned it
> into a slow-motion MP3 soundscape (32K MP3 audio file)
>    http://www.seeingwithsound.com/extra/tongue2slow.mp3
> Note: if your browser is not properly configured for MP3
> files, it may try to show the contents of an MP3 file
> as text inside the window, which gives binary nonsense
> and no sound at all. In that case, you may have to first
> save the MP3 file directly to your disk and "run" the file
> from there. Furthermore, it is recommended to set your
> MP3 player to autorepeat, such that you will have all the
> time to mentally focus on the various details in this
> complex soundscape.
> The image shows a frontal close-up of Kynn's face, with
> both shoulders showing in the lower part of the image to
> the left and right side of the face. Kynn's face is just
> about in the middle of the image and the upper part of
> the scalp touches the top edge of the image. Kynn is
> looking straight ahead towards the camera, mouth wide open
> and tongue sticking out (sorry Kynn, but I couldn't resist
> this one; after all, you did publish this nice photograph
> on the web, and your comfort here is that there is a
> similar very famous photograph of Albert Einstein doing
> the same tongue act, so I think you are in good company).
> Now you will hear a kind of low-pitched rhythm on the
> left and right side in this stereo sound. These are the
> vertical stripes of the shirt covering Kynn's left and
> right shoulder. The high-pitched tones in the middle
> of the soundscape are the reflection of the ceiling
> light on Kynn's hair and scalp. The smoother sounds on
> the far left and right are from the more or less uniform
> bright background parts. On the right side, from the
> viewpoint of the camera, Kynn is holding up his hand
> showing palm and fingers, but that is here very difficult
> to hear out unless you know exactly what to listen for.
> Now we can hear some more details of Kynn's photograph if
> we zoom in (pressing F4 in The vOICe Learning Edition),
> and the resulting MP3 sound can be downloaded from the URL
>    http://www.seeingwithsound.com/extra/tongue2zoomslow.mp3
> Sighted readers can compare this soundscape to the
> corresponding zoomed-in JPEG image
>    http://www.seeingwithsound.com/extra/tongue2zoom.jpg
> to judge for themselves to what extent the soundscape matches
> the image content. Readers who still lack an MP3 audio player
> can instead of downloading the above MP3 audio file, download
> the equivalent but much larger WAV file (176K) from the URL
>    http://www.seeingwithsound.com/extra/tongue2zoomslow.wav
> A fairly brief tone with a clear pitch standing out in the
> left (that is, first) half of the soundscape is from Kynn's
> white teeth in the upper jaw. If you listen carefully, you
> can even hear some sort of irregularity within this sound,
> caused by the boundaries between the individual teeth. Also,
> if you concentrate, you can at the very same moment that you
> hear the teeth, also hear a soft higher-pitched woosh, which
> happens to be Kynn's nose wich is of course above the teeth.
> On the lower right there is the low-pitched rhythm of the
> stripes of Kynn's shirt. Simultaneously, there is a rather
> loud higher pitched noise from the bright background that
> shows between Kynn's face on the left and his hand on the
> right - again as seen from the camera viewpoint. I hope you
> had some fun from this description.
> Since Kynn took the snapshot using his QuickCam PC camera,
> the "Kynn-Cam", he should be able to listen to live images
> for himself using his camera and The vOICe Learning Edition
> software. Also, he could import his existing image files
> through the "Sonify image files" option in the File menu
> (or use the Control O keyboard shortcut to the file requester)
> and play with the various controls for zoom (F4 and arrow
> keys, and Shift F4 for still more zoom) and slow motion (F3
> or Control Alt F3 for very-slow motion) or inverse video (F5).
> For those who are unfamiliar with the rules of image to
> sound mapping: there are three simple rules in the general
> image to sound mapping of greyscale camera images, each
> rule dealing with one fundamental aspect of vision:
> rule 1 concerns left and right, rule 2 concerns up and
> down, and rule 3 concerns dark and light. The actual rules
> of the game are
> 1. Left and Right.
>    Video is sounded in a left to right scanning order, by
>    default at a rate of one image snapshot per second. You
>    will hear the stereo sound pan from left to right
>    correspondingly. Hearing some sound on your left or right
>    thus means having a corresponding visual pattern on your
>    left or right, respectively.
> 2. Up and Down.
>    During every scan, pitch means elevation: the higher
>    the pitch, the higher the position of the visual pattern.
>    Consequently, if the pitch goes up or down, you have a
>    rising or falling visual pattern, respectively.
> 3. Dark and Light.
>    Loudness means brightness: the louder the brighter.
>    Consequently, silence means black, and a loud sound means
>    white, and anything in between is a shade of grey.
> All of this means, for example, that a straight bright line on
> a dark background, running from the bottom left to the top right,
> sounds as a tone steadily increasing in pitch: ooiieep. Two bright
> lines give two tones. Three distinct bright dots sound as three
> short beeps, and so on. Although the rules are simple, real-life
> images like the photograph of Kynn often give very complex sounds,
> because there is so much to be seen.
> The direct download URL for the evaluation version of The vOICe
> Learning Edition executable voice.exe, available for personal
> use, is
>    http://ourworld.compuserve.com/homepages/Peter_Meijer/voice.exe
> while the on-line description of this software can be found
> at the URL link given below.
> Have fun playing!
> Peter Meijer
> Soundscapes from The vOICe - Seeing with your Ears!
> http://ourworld.compuserve.com/homepages/Peter_Meijer/winvoice.htm
Received on Monday, 22 November 1999 13:26:37 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 13 October 2015 16:21:06 UTC