- From: Ian Hickson <ian@hixie.ch>
- Date: Tue, 9 Mar 2010 10:44:44 +0000 (UTC)
- To: public-html@w3.org
SUMMARY The spec is very vague about what image analysis techniques could be applied to images. This change proposal suggests including more detail about possible techniques. RATIONALE Currently the <img> element section mentions that UAs "may also apply heuristics to help the user make use of the image when the user is unable to see it", but the only suggested heuristic is OCR. In practice, there are a host of other heuristics that could help a user make sense of an image, and they might be useful even to users who _can_ see the image. We do all users a disservice by not being more explicit here. Being explicit could encourage significant competition amongst user agents, leading to a much better user experience for everyone. Since these heuristics are in many cases already implemented and shipping, sometimes in multiple products from multiple vendors, and since recent advances in image recognition techniques have been fast and furious, it seems reasonable to mention these techniques as real possibilities. DETAILS Strike "when the user is unable to see it". Instead, start a new sentence before the "e.g", which says "This would be especially useful to users who cannot see the image", and add the following after the "e.g." clauses, in a separate clause: "but it could also be useful to users who _can_ see the image, but might not fully understand or recognise it". Move "optical character recognition (OCR) of text found within the image" to be the first bullet of a bulleted list, and add the following additional points: * Facial recognition in photographs, especially facial recognition of notable individuals or of individuals in the user's social network. * Product or brand recognition in photographs or logos. * Barcode recognition of any embedded barcodes. * Bitmap to vector analysis for diagrams, allowing images to be further analysed in specialised tools. * Data extraction for graphs, allowing data to be reconstructed from bar charts, pie charts, and the like, or allowing regression lines to be fitted to x,y plots. * Landmark recognition for photographs. * 3D reconstruction of scenes based on multiple images, allowing a set of images to be taken together and explored in context. IMPACT POSITIVE EFFECTS Adding such text could lead to a renewed level of competition in browsers as they find the best ways to expose such tools to users. Such competition would inevitably lead to improved accessibility across the board, as many of these analysis techniques could provide users with anything from a basic hint of the image's contents to fully-interactive reconstructions of the image in more accessible forms (especially in the case of text-in-image or graphs). NEGATIVE EFFECTS Makes the spec longer. CONFORMANCE CLASS CHANGES None. RISKS It is suggested that mentioning that user agents might be able to repair non-conforming pages could make authors less likely to write conforming pages, though it is not clear why this would apply here and not in the many other parts of the spec that mention repair techniques, especially the sections that explicitly mandate specific user agent repair techniques. -- Ian Hickson U+1047E )\._.,--....,'``. fL http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Received on Tuesday, 9 March 2010 10:45:14 UTC