W3C home > Mailing lists > Public > w3c-wai-ig@w3.org > October to December 1999

Re: Practice Describing Pictures, anyone game?

From: <peter.b.l.meijer@philips.com>
Date: Fri, 12 Nov 1999 11:06:33 +0100
To: <w3c-wai-ig@w3.org>
Message-ID: <0056890006773084000002L942*@MHS>
Kynn Bartlett wrote

> I recently went on a trip to Rome to speak at the E-Commerce
> Summit (http://www.e-commerce-summit.com/) and the day before
> the summit started, I went on a commercial tour of Rome and took
> many pictures of what I was seeing.  I would like to make these
> available on the web, and, as a practice exercise, I'd like to see
> if anyone (who can see my pictures) would be interested in helping
> to describe these pictures or at least evaluating the descriptions
> that I or someone else has provided.
> Anyone game?

Well, I am sighted, and until you put these pictures on the web, 
we can take one of your existing photographs. I took the liberty 
of picking your nice Kynn-Cam image tongue2.jpg, available from 
your website at


and using The vOICe Learning Edition software I turned it 
into a slow-motion MP3 soundscape (32K MP3 audio file) 


Note: if your browser is not properly configured for MP3
files, it may try to show the contents of an MP3 file 
as text inside the window, which gives binary nonsense
and no sound at all. In that case, you may have to first 
save the MP3 file directly to your disk and "run" the file
from there. Furthermore, it is recommended to set your 
MP3 player to autorepeat, such that you will have all the 
time to mentally focus on the various details in this 
complex soundscape. 

The image shows a frontal close-up of Kynn's face, with 
both shoulders showing in the lower part of the image to 
the left and right side of the face. Kynn's face is just
about in the middle of the image and the upper part of
the scalp touches the top edge of the image. Kynn is
looking straight ahead towards the camera, mouth wide open
and tongue sticking out (sorry Kynn, but I couldn't resist
this one; after all, you did publish this nice photograph
on the web, and your comfort here is that there is a 
similar very famous photograph of Albert Einstein doing 
the same tongue act, so I think you are in good company).

Now you will hear a kind of low-pitched rhythm on the 
left and right side in this stereo sound. These are the
vertical stripes of the shirt covering Kynn's left and
right shoulder. The high-pitched tones in the middle
of the soundscape are the reflection of the ceiling
light on Kynn's hair and scalp. The smoother sounds on 
the far left and right are from the more or less uniform
bright background parts. On the right side, from the 
viewpoint of the camera, Kynn is holding up his hand 
showing palm and fingers, but that is here very difficult
to hear out unless you know exactly what to listen for.

Now we can hear some more details of Kynn's photograph if 
we zoom in (pressing F4 in The vOICe Learning Edition),
and the resulting MP3 sound can be downloaded from the URL


Sighted readers can compare this soundscape to the 
corresponding zoomed-in JPEG image


to judge for themselves to what extent the soundscape matches
the image content. Readers who still lack an MP3 audio player
can instead of downloading the above MP3 audio file, download
the equivalent but much larger WAV file (176K) from the URL


A fairly brief tone with a clear pitch standing out in the
left (that is, first) half of the soundscape is from Kynn's 
white teeth in the upper jaw. If you listen carefully, you 
can even hear some sort of irregularity within this sound,
caused by the boundaries between the individual teeth. Also,
if you concentrate, you can at the very same moment that you
hear the teeth, also hear a soft higher-pitched woosh, which
happens to be Kynn's nose wich is of course above the teeth. 
On the lower right there is the low-pitched rhythm of the 
stripes of Kynn's shirt. Simultaneously, there is a rather 
loud higher pitched noise from the bright background that 
shows between Kynn's face on the left and his hand on the 
right - again as seen from the camera viewpoint. I hope you 
had some fun from this description.

Since Kynn took the snapshot using his QuickCam PC camera,
the "Kynn-Cam", he should be able to listen to live images 
for himself using his camera and The vOICe Learning Edition
software. Also, he could import his existing image files 
through the "Sonify image files" option in the File menu 
(or use the Control O keyboard shortcut to the file requester)
and play with the various controls for zoom (F4 and arrow 
keys, and Shift F4 for still more zoom) and slow motion (F3 
or Control Alt F3 for very-slow motion) or inverse video (F5).

For those who are unfamiliar with the rules of image to
sound mapping: there are three simple rules in the general
image to sound mapping of greyscale camera images, each 
rule dealing with one fundamental aspect of vision: 
rule 1 concerns left and right, rule 2 concerns up and 
down, and rule 3 concerns dark and light. The actual rules
of the game are 

1. Left and Right.

   Video is sounded in a left to right scanning order, by 
   default at a rate of one image snapshot per second. You 
   will hear the stereo sound pan from left to right 
   correspondingly. Hearing some sound on your left or right 
   thus means having a corresponding visual pattern on your 
   left or right, respectively. 

2. Up and Down.

   During every scan, pitch means elevation: the higher 
   the pitch, the higher the position of the visual pattern. 
   Consequently, if the pitch goes up or down, you have a 
   rising or falling visual pattern, respectively. 

3. Dark and Light.

   Loudness means brightness: the louder the brighter. 
   Consequently, silence means black, and a loud sound means 
   white, and anything in between is a shade of grey. 

All of this means, for example, that a straight bright line on 
a dark background, running from the bottom left to the top right,
sounds as a tone steadily increasing in pitch: ooiieep. Two bright
lines give two tones. Three distinct bright dots sound as three
short beeps, and so on. Although the rules are simple, real-life 
images like the photograph of Kynn often give very complex sounds,
because there is so much to be seen.

The direct download URL for the evaluation version of The vOICe
Learning Edition executable voice.exe, available for personal
use, is


while the on-line description of this software can be found
at the URL link given below.

Have fun playing!

Peter Meijer

Soundscapes from The vOICe - Seeing with your Ears!
Received on Friday, 12 November 1999 05:06:47 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 13 October 2015 16:21:06 UTC