- From: David Cary <d.cary@ieee.org>
- Date: Sun, 11 Jan 1998 00:56:42 -0500
- To: www-html-editor@w3.org
character encoding in HTML 4.0 Thanks for all the work you and the others have put into putting HTML 4.0 together, revised, and put online. There's just one little thing that doesn't quite look right to me. Perhaps I am just misinterpreting something ? In section "5.2.2 Specifying the character encoding", http://www.w3.org/TR/REC-html40/charset.html#h-5.2.2 it appears the priorities listed are reversed -- -- it seems to me they should be 1. A META declaration with "http-equiv" set to "Content-Type" and a value set for "charset". 2. An HTTP "charset" parameter in a "Content-Type" field. 3. The charset attribute set on an element that designates an external resource. In other words, if a web server gives a header like Content-Type: text/html; charset=ISO-8859-1 but the text of the document itself says <META http-equiv="Content-Type" content="text/html; charset=EUC-JP"> it seems to me that the author of the document is more likely to know what the proper encoding is, and therefore the HTML user agent should render this document in Japanese. However, the current HTML 4.0 specification seems to indicate that the HTML user agent "must" render it in ISO-8859-1. Rationale: if a particular document changes languages (not that this happens very often), the author of the document is the first to know about it (and is unable to change anything but (1)), the system administrator is the next to know about it (and is the only one capable of changing (2)), and other authors (elswhere on the web that refer to that document) are usually the last to know about it(and are the only ones capable of changing (3)). By the way, I'd like to thank whoever pushed for hexadecimal character references. They make things much simpler for me. -- + David Cary "mailto:d.cary@ieee.org" "http://www.rdrop.com/~cary/" | Future Tech, Unknowns, PCMCIA, digital hologram, <*> O-
Received on Sunday, 11 January 1998 01:57:07 UTC