- From: Michele Diodati <michele.diodati@gmail.com>
- Date: Thu, 30 Dec 2004 18:04:45 +0100
- To: w3c-wai-gl@w3.org
Hi Gregg. > 1) I agree there are pronunciation problems with AT. If we can find good > practical methods for addressing this we will. It currently isnt required > at level 2. if you have ideas how to do this in a practical way - we would > love to hear them. I think the main thing you as WAI should do is to hardly press assistive technologies producers until they develop always more intelligent applications. Unfortunately web developers are in most cases only... web developers. They are simply and totally unaware of the many lexical and syntactic ambiguities inside the texts they daily publish on the Web. Despite their possible best will to respect WCAG requirements about changes in natural languages, they don't master this subject enough to disambiguate all the numerous instances that can occur, for example, in Italian technical or advertising documents. Moreover, frequently they haven't enough time to carefully read and consider from a linguistic viewpoint the content they are assembling for the Web. Having said that, the request "The meanings and pronunciations of all words in the content can be programmatically located" [1] appears really inapplicable, at least as regards pronunciations. Italian web sites (I'm sorry if I refer all the time to Italian situation, but my experience is about that) are literally replete of English words and phrases, scattered all over the textual content. You can find into them an _endless_ series of "home page" (instead of "prima pagina" or "pagina principale"), "download" (instead of "scarica"), "account", "login", "best viewed", "compliance", "effort", "password", "trial", "trailer", "bottom up", "top page", "signature", "conference call", "call center", "brainstorming", "briefing", "jogging", "fitness", "future", and hundreds and hundreds of other English words and phrases, ordinarily used into web pages having Italian as their main language. Many of these thousands of English words are "standard extension" of Italian language: you can find them in the latest Italian dictionaries. Many more of them are not. Almost all of them are mispronounced when read by assistive technologies (for example, Jaws 4.50 reads "download" inside Italian web pages in a very incomprehensible manner). Do you really think Italian web developers, while trying to comply with WCAG 2.0 L2 SCs for GL 3.1 (or with WCAG 1.0 4.1 checkpoint), will open _millions_ of existing web pages, searching for _billions_ of instances of English words and phrases, to make them accessible as for their pronunciation? It is a dramatically unmanageable burden! This is a cultural issue before it is an accessibility issue. Italian authors' habit of using a bulk of English words and phrases within texts mainly written in Italian is a sort of fashion, probably due to a certain cultural subjection against everything comes from US. This habit is very harmful from the accessibility viewpoint: this sometimes ridiculous soup of English/Italian words lowers the comprehensibility of content for people with low levels of schooling, and make very and vainly difficult, for people using a speech synthesizer, catching the right word spoken by the machine, especially when English words are inserted just in the middle of Italian sentences, in places where a listener would rather expect to find Italian words. Nevertheless such a habit exists, and is widespread. A realistic approach to the changes of language issue within WCAG have to take into account this reality. In a context as the Italian web community, it isn't realistic the claim that web developers have enough time and ability to carry out, for each document published on the Web, a gruelling work, consisting of a meticulous reading of all the content, followed by recognition and appropriate indication of all foreign words and phrases used in the text. I think it would be much more useful if WCAG 2.0, instead of requiring that web developers mark appropriately _each_ foreign word or passage inserted in the content, simply warned authors to slash using foreign words and phrases when a satisfactory and valid alternative exists, if only they condescended to use appropriately the main language of the document. By the way, for all the English words I cited above, normally used in Italian web pages, there is plenty of valid Italian alternatives... >> 2. The present separation in L2 SC3, between words included and not included >> in dictionaries, does not give a valid solution for a lot of situations >> arising from intrinsic ambuiguity and complexity of natural languages. > > GV Not sure I follow. The rule in #1 is that a dictionary be attached for > all words in the content. And definitions be created for custom words that > are not in any dictionary. Dictionaries can address the issue of the meaning of the words used in the content, only when the user has understood the pronunciation of those words. On the contrary, they can't address the issue of understanding single words, for example when they are mispronounced by assistive technologies. If a user isn't able to understand a word, both because the word is mispronounced or because he isn't able to catch its correct foreign pronunciation, there remains a hole, for the user, in the speech he is listening and trying to understand. What good is the dictionary, when I don't know what I am looking for? Sometimes you can listen again many times the same sentence without being able to understand what the hell is that single word in the middle of the sentence, fundamental for its comprehension. The more foreign words are inserted into sentences written in the main language of the document, the greater will be the risk that the listener will not be able to understand all the words in the content, and probably the whole meaning of the text. >> 4. The identification of the natural language of each block of text in a web >> page should be delegated to user agents. > > GV For larger blocks of text I think this may be workable. Little phrases > could be harder. I would like to see this handled by User agents. If they > can (without markup) then the language of the words would be > "programmatically determined" without needing markup. I don't see a difference in principle between the ability to recognize a single word as English in the middle of sentence written in French or Italian, and the ability to determine that a whole sentence is written in a given language, whatever it is. If an assistive technology has a series of built-in dictionaries and speech engines large enough to understand that "home page" is English and "prima pagina" is Italian, it has all the necessary to pronounce those words according to the phonetic rules of the respective languages, and it doesn't matter whether these words are isolated within a text written in a different language or they are part of a block of text all written in the same language. For example, a browser UTF-8 compliant can show alongside in the same web page sentences written in many different languages, different alphabets, different text directions without any specific markup differentiating a run of text from all the others [2].Working on the basis of universality granted by Unicode, assistive technologies could easily develop the ability to determine which language a run of text is, and pronounce it according to the phonetic rules of the given language. In case of omographs shared between more than a natural language, they could develop suitable algorithms to choose from the context the more likely between all possible pronunciations. > Do you know of tools > that can do this? Esp if they are publicly available. At the moment I don't know any tool capable of so an advanced level of automatic linguistic recognition. Anyway, in my opinion the present inadequacy of technologies isn't a valid reason to put on web developers a burden totally unproportioned to the effective competences of the vast majority of them. Best regards, Michele Diodati -- http://www.diodati.org [1] <http://www.w3.org/TR/WCAG20/#meaning-prog-located>. [2] Here is a working example: <http://www.columbia.edu/kermit/utf8.html>.
Received on Thursday, 30 December 2004 17:05:17 UTC