- From: Daniel Dardailler <danield@w3.org>
- Date: Fri, 23 Jan 1998 10:25:41 +0100
- To: w3c-wai-ig@w3.org
Since this thread is now discussing specifically tools that would help solving the problem of missing ALT, I'm changing the Subject line. For the same reason, I'd like to present some ideas for a project that would use some form of Web collaboration to fix the missing ALT problems. The document is at http://www.w3.org/WAI/altserv.htm Also attached in ascii below. ------------------------------------------------------------------------ The ALT-server ("An eye for an alt") A accessibility/collaboration project proposal Introduction For most people, visiting a web page is a one-to-one experience where one client program, the user's browser, gets and presents some resources coming from one information provider, usually the provider's web server. It's the client/server paradigm as we understand it. The client usually gets an "initial" HTML file from which it derives a complete presentation made of pieces found in the document itself (text, markup, style, alt text, etc) and additional pieces fetched by going back to the provider servers (images, audio, longdesc, etc). It doesn't have to be always that way. The web addressing and transport architecture is flexible enough so that the set of resources that makes one's web session can seamlessly integrate from independent providers, or chains of providers. This paper presents the application of this principle to Web Accessibility. The design of a system to retrieve and generate missing textual description of particular HTML elements (such as images) is examined, as well as the human collaboration foundation on which it is based. Web Accessibility Web Accessibility covers a very broad set of issues. There are of course different kind of disabilities to consider, such a visual or hearing impairment, which all relate to different types of access denial (e.g. a missing caption for an audio stream, or the inability to linearize the content of a table for speech output). For the purpose of this paper, we will focus on one important aspect of accessibility for non-visual based user-agent: the textual description, or rather the lack thereof, attached to graphical images on the web, i.e. the well known missing ALT text in HTML. However, we believe the system presented can be generalized to other kind of resources. The current situation is the following: when presented with a piece of HTML containing an image, a non-visual browser needs to "degrade gracefully" by presenting the user with a textual version of the image (that can be output as speech or braille). Most such systems currently only look for the textual information in the ALT attribute of the IMG element of the image. With progress happening in the browser area, other ways to find this textual/alternate text will soon be implemented, such as: look in the TITLE attribute, the HTTP stream, or the URL filename part. What characterizes these approaches is that the textual description can only come from the origin document or the provider server. Suppose... As we alluded to in the introduction, another way of getting this information is to ask a different server altogether. Suppose there was a web server somewhere on the Internet whose primary job was to serve textual description of other servers' images. A non-visual browser (such as lynx) would then just have to query it when image ALT and TITLE attributes are missing and use the result in its presentation. Let's consider an example expressed in pseudo HTTP sequences of queries and replies: * A non-visual browser makes a request for an initial HTML document GET www.merchand.com /order.html * It gets back some HTML containing an image and no ALT text for it Content-type: text/html <HTML> ... <IMG SRC="Images/card.png"> Order now! * Since there is no ALT, nor TITLE, the browser is configured to make a query to an alt text server (wai.w3.org), giving it the absolute URL of the image as a parameter GET wai.w3.org /ALT?url=www.merchand.com/Images/card.png * The alt server returns a piece of text corresponding to the textual description of this image Content-type text/ascii A credit card logo (if the alt text server hadn't returned anything, the browser could default to using "card.png" as the textual description of this image) * The non-visual browser uses that in place of regular ALT text. We'll look at the server issues later on, but for now, let's concentrate on the client side. The important line in the example above is the query to the alt server. GET wai.w3.org /ALT?url=www.merchand.com/Images/card.png It's just a regular HTTP request that provides the server with the URL of an image and expects the ALT text for this image back. The /SERVICE?name=value can of course be generalized to handle different type of resources and more information about one given resource. Consider the following examples: GET wai.w3.org /TABLE?url=www.foo.com/doc.html#n4 which asks the server for a linear/textual version of the fourth table found in www.foo.com/doc.html. GET wai.w3.org /AUDIOCAPTION?url=www.merchand.com/Sounds/hello.midi which asks for a caption of the audio track found at the given URL. GET wai.w3.org /ALT?url=www.merchand.com/Images/card.png;isA;line=2 which mentions to the server that the image is used in the context of an A tag (a link anchor), and appears on line 2 of the document, making it more important to treat. I think it's easily understandable what one can achieve there. Two things worth mentioning: the performance hit is nothing really to worry about: we merely add a web request in the overall building of a page, as graphical browsers do all the time. The minimal configuration on the browser side is also really small: declaring the alt server name to query from. So it's mostly transparent for the non visual browser user. The implementation of this new GET functionality in a given browser (like lynx of amaya) is trivial. Collaboration At the beginning of the previous section, we assumed that "there was a web server somewhere on the Internet whose primary job was to serve textual description of other servers' images". How do we go about implementing this alt server ? Basically, I envision two ways of generating alt text for images. The first is automatic extraction, the second human generation. I will not expand on the first as this is an area of advanced research (shape and pattern recognition). I'll just mention that for a entire category of images, those representing text using some big fonts and colors, there exist algorithms out there (e.g. OCR) that could be used to extract the characters out of the graphics. A centralized server is well suited to integrate the latest and greatest solution in one location while readily serving the entire community. The second way, human generation, is where I see the power of the web as a collaboration tool best applied. This is how it could work. The ALT server logically maintains a list of tuples (image-url, textual-description, state) where state is one of to-be-described, being-described, and described. Processing works as follow: * queries for textual descriptions of images come in * if the image, identified by its url, is in the list with a described state, the description is returned. * if not, a failure code is returned and the image url is added with a to-be-described flag * the server continuously prioritizes this list using generic information such as frequency of the query (how many clients asked for the same data) and specific information (the image appears in an anchor, at the beginning of the page, etc) * the server provides a form interface for sighted web users to enter the textual description of the images * once an image has gotten its textual description, it is moved in the described state The part with the form filling needs to be detailed. Each time a sighted user access the form (see annex), the next to-be-described image with the highest priority is presented while its state moves to being-described. Sighted user can then enter the description in an input field aside the image and submit the form to the server, which validates the text and either move the entry in the described state of just unlock it my moving it back to to-be-described state (invalid might be empty text for instance). The locking is necessary due to the asynchronous nature of web form filling: several users could access and fill the "same" form at the same time, and we only need one image description per image. So this is basic principle of the alt text server: use the eyes of sighted web volunteers to help those who cannot see. The reason why this system can work is based on two facts: * there are much more sighted users than the opposite, and a lot of them are willing to help * the same images are requested by a lot of different people In addition, the number of images with no description should go down as the awareness of content providers to accessibility is raised and authoring tools are improved. If the automatic extraction part is improved, this will also diminish the number of images actually needing some human collaboration. Regarding implementation on the server side, see the annex for pseudo code. I expect a first version handling the base service to take a week of programming. A more complete version (generating reports, ranking, and doing more automatization can take a couple to several months. Misc [ to be developed ] * One issue is whether such a system can be detrimental to native alt as content providers start counting on the system to solve their alt problem... * I think not because the system can be used as incentive for content provider to add alt text to their own images: anybody can query the alt server by site and see the "no alt worse of the week". In effect, the alt server can act a complaint repository. * storing the describer id can be used to provide a ranked output as incentive to get more describers * automatic recognition of simple decoration (one color line, repeated pattern, 1x1, 2xN images, etc) should be easy * send email to webmaster@host for each 10 http//host/image.png with no alt. * provide a way to check and do notification of mis-described image (empty, dirty words, etc) * provide a way to check and do notification of broken link image * provide n-at-a-time image/input-desc form * provide site and url targeted form filling (e.g. a visual impaired friend asks me to help him with a particular site or image and the prioritization scheme needs to be shortcut - note that this could also be accomplished by running a private alt text server). * provide a way to seed the database of the alt server using a robot that explores a given list of sites. * sort images by base filename (www.foo.com/img.png and www.bar.com/dir/img.png) and do a image compare to see if they are really the same. * do the same identical-image-check based on same size images pre-sort. * hard to identify images used as pieces (jigsaw) in a bigger image laid out in a table Annexes 1 - Form layout Example of form used to query the sighter user. ------------------------------------------------------------------------ ------------------------------------------------------------------------ Welcome to the ALT-server filling form You are about to describe: http://www.merchand.com/Images/card.png (entered Dec 25 1997 and was queried 12 times) Enter the description(*) of this image [No Alt by definition] Select Language: Optional: Email: Name: (*)the description should be short and to the point: e.g. "a american express credit card", "a dog", "a map with a magnifier". No need to add "an image of" or "this is a". Link to Advanced query form providing n-at-a-time, site or url targeted filling, and database dump ranked by images site name, describer id, base image file name, etc. ------------------------------------------------------------------------ ------------------------------------------------------------------------ 2 - Simple pseudo code for ALT-server script This script handles both the queries for alt desc and the insertion of alt desc by sighted users for an hypothetical server hosted at http://www.w3.org/WAI/altserv // // for now ignore lang, date entered, number queries, id of describer, // checking dup, validaty, security, and additional services like // ranking of bad site, good describers, etc) // // // INPUT: 3 cases // // 1 (asking for textual description of url) // http://www.w3.org/WAI/altserv?url=www.merchand.com/Images/card.png // // 2 (giving a textual description for url) // http://www.w3.org/WAI/altserv?url=www.merchand.com/Images/card.png // desc="A credit card logo" // // 3 (asking for form to fill in desc for a url) // http://www.w3.org/WAI/altserv // // // OUTPUT: see RETURN statement below // // // maintains a persistent list of [url, desc, state] // with state = d, bd, tbd (described, being described, to be described) if url if !desc // case 1 : asking for textual description of url if (url in list) if (list[url].state = d) RETURN list[url].desc else RETURN no desc else add url in list list[url].state = tbd RETURN no desc else // case 2 : giving a textual description for url // should check list[url].state = bd and desc valid list[url].state = d list[url].desc = desc RETURN ok else // case 3 : no param, asking for form to fill in desc for a url get top url with list[url].state = tbd list[url].state = bd // should check url valid with HEAD RETURN form HTML with embedded image url
Received on Friday, 23 January 1998 04:26:03 UTC