[UBER 3.1] natural language can be identified?

Hi everyone,

I have a remark about the first lvl 1 success criteria for the UBER 3.1
proposal [1]:
"The natural language of the document as a whole can be identified by
automated tools, including assistive technology. " 

As a computer scientist in the field of natural language processing, I can
argue that this can always be done, if the tools are sophisticated enough.
For example, I can use machine learning (for example neural nets) to train a
program to recognize the language based on the words used. Although this
would be a nice feature for the UAAG group, I do not think this is the
intended meaning of this succes criterion. 

The key problem with the current wording to me is "can be identified by
automated tools". Instead we should say that the author should identify the
language in such a way that automated tools can use it. 

The rewording could be something like: "The natural language of the document
as a whole is identified and available for user agents, for example using
metadata. " The "and available for user agents" is necessary because we do
not want people placing a sentence "This page is in English" in the footer
of the document and be done with it, that wouldn't help speech browsers.

Yvette Hoitink
CEO Heritas, Enschede, The Netherlands
E-mail: y.p.hoitink@heritas.nl

[1] http://lists.w3.org/Archives/Public/w3c-wai-gl/2004JanMar/0308.html

Received on Thursday, 19 February 2004 18:36:41 UTC