- From: Alan J. Flavell <flavell@a5.ph.gla.ac.uk>
- Date: Wed, 18 Nov 1998 13:47:48 +0000 (GMT)
- To: Charles McCathieNevile <charlesn@srl.rmit.EDU.AU>
- cc: "'GL - WAI Guidelines WG'" <w3c-wai-gl@w3.org>
On Wed, 18 Nov 1998, Charles McCathieNevile wrote: > Maybe the problem should be re-expressed into marking up the Character > set. HTML only has one document character set: Unicode. > I can read a bit of Japanese and Greek, but if the character set is > not marked, and the language is not marked, then I simply have to guess > at all the character sets I can think of. Then if I make a lucky guess, I > have to work out a font. Excuse me, but this seems to be confusing HTML with some proprietary word-processing format. Well, alright, there are some misguided pseudo-HTML documents that are made that way (I'm thinking most particularly of documents that go FONT FACE="Symbol" and then expect their Roman letters to be displayed as Greek, but the same has been seen with other alphabets). But these are not well-formed WWW documents, surely the WAI does not have to devise ways of displaying them? HTML documents use quite a number of different encodings (designated by that confusingly-named "charset" parameter on the content-type header), but every properly-transmitted document has this "charset" explicitly stated (except for pre-HTML4.0 documents in iso-8859-1, where the charset attribute is optional). Now, I have to admit I am entirely unfamiliar with how a screen reader would deal with this, but I firmly feel that whatever it does, it has to be based on a proper recognition of the interworking protocols. > With a hint from the origin of the document, it gets a little easier. Technically, the language of the content and its character encoding are two unrelated issues. (Even if that seems unrealistic and impractical, I'd say that trying to take any other view leads to far too many anomalies). And knowing that a document is in iso-8859-1 does not help to know how to pronounce the document if one does not know whether it is Icelandic, Gaelic, Portugese... Nor would it be more than an unpleasant kludge to take a stab at the language based on the DNS name. Anyway, a solution for a site which has been constructed without explicit content language specifications would seem straightforward: simply arrange for the server to send out an HTTP content-language header. It needs no editing of the web pages themselves (if documents are available in various languages, some action may be needed to identify them, e.g by appropriate choice of filename - see Apache's Multiviews for ideas). The meaning of the HTTP content-language header is subtly different from language specifications within an HTML document, it's true, but I'd argue that either solution would be serviceable, for individual documents that are in a single language. I suppose I'll get told that the existing client agents don't do anything with this HTTP header, so it doesn't help them with their rendering. That would be unfortunate, as this is a bona fide part of the protocol. best regards
Received on Wednesday, 18 November 1998 08:47:57 UTC