HTML markup for language identification?

The following question was prompted by recent discussion on this list of
whether tags should be added to HTML so as to identify abbreviations and
acronyms. For purposes of both braille and speech output, it is necessary
to be able to ascertain the language in which a document is written. It
might well be argued that such functionality can be achieved within the
HTML user agent by means of dictionaries, perhaps combined with a
grammatical analysis of the text. However, in the case of multilingual
documents, it may be more difficult to determine which words are intended
to be written in which language, especially if there is some
correspondence in spelling. How reliable are software-based "language
identification techniques? Do multilingual documents occur frequently
enough to warrant the inclusion of a specific HTML tag, or should the
question be decided as a matter of principle rather than on the basis of
perceived frequency?

Regards,

Jason White.

Received on Tuesday, 10 June 1997 20:00:09 UTC