- From: Dave Raggett <dsr@w3.org>
- Date: Wed, 11 Jun 1997 08:19:19 -0400 ()
- To: Jason White <jasonw@ariel.ucs.unimelb.EDU.AU>
- cc: w3c-wai-wg@w3.org
On Wed, 11 Jun 1997, Jason White wrote: > it is necessary to be able to ascertain the language in which a > document is written. It might well be argued that such > functionality can be achieved within the HTML user agent by means > of dictionaries, perhaps combined with a grammatical analysis of > the text. However, in the case of multilingual documents, it may > be more difficult to determine which words are intended to be > written in which language, especially if there is some > correspondence in spelling. How reliable are software-based > "language identification techniques? My understanding this that techniques for morphological analysis are really quite good, but are better suited to supporting authoring, where the authoring tool can take some of the grind out of entering the markup. HTML Cougar basically subsumes RFC 2070 including the LANG attribute which is intended for this purpose. The following is extracted from the spec: The lang attribute's value is a language code that identifies a natural language spoken, written, or otherwise conveyed by human beings for communication of information to other human beings. Computer languages are explicitly excluded from language codes. The syntax and registry of HTML language codes is the same as that defined by [RFC1766]. Language codes consist of a primary code and a possibly empty series of subcodes: language-code = primary-code *( "-" subcode ) primary-code = 1*8ALPHA subcode = 1*8ALPHA EXAMPLE: "en": English "en-US": the U.S. version of English. "en-cockney": the Cockney version of English. "i-cherokee": the Cherokee language spoken by some Native Americans. Two-letter primary codes are reserved for [ISO639] language abbreviations. Three-letter primary codes are reserved for "Ethnologue" abbreviations ([ETHNO]) (in in addition to the requirements of [RFC1766]). Any two-letter primary code is a [ISO3166] country code.. Regards, Dave Raggett <dsr@w3.org> tel +44 122 578 2521 url http://www.w3.org/People/Raggett World Wide Web Consortium (on assignment from HP Labs)
Received on Wednesday, 11 June 1997 08:18:32 UTC