- From: Jason J.G. White <jason@jasonjgw.net>
- Date: Tue, 29 Oct 2024 12:47:59 -0400
- To: public-rqtf@w3.org
On 29/10/24 05:14, Scott Hollier wrote: > Continuing the discussion on AI, here at the Centre we’ve been > testing the Android 15 update that now embeds Google Gemini alt text > assessment into the TalkBack screen reader. I think this may be the > first time AI alt text has been directly built into a screen reader, > so it’s been of considerable interest for us. Note that Apple has had image recognition in its screen readers for several years, but it's on-device machine learning, which doesn't run the most advanced models yet. Vispero announced image description capabilities using multiple large language models earlier this year, which are included in their JAWS screen reader. For NVDA, I think the capability requires an add-on to be installed. Under Linux, if I recall correctly, someone implemented this as a separate tool - I wasn't paying attention to the details. I think Vispero's approach is interesting, in that it lets you access multiple descriptions created by different models. I don't know to what extent this is useful in detecting errors - it's presumably better as long as they don't all make the same mistake. I also expect on-device recognition to become more popular for privacy reasons and as the local hardware is upgraded to be able to run larger models. The models themselves may become more efficient over time as well, but that's just my speculation; I don't have any background in the mathematics of neural networks. Whatever we say about this will need to be generic and model-neutral. On a related topic, there were claims attracting media atention recently to the effect that a speech recognition model could generate erroneous text, including completely fabricated sentences, so one should be careful in using them for captions or transcripts (as we already well knew).
Received on Tuesday, 29 October 2024 16:48:04 UTC