character encoding in HTML 4.0

character encoding in HTML 4.0

Thanks for all the work you and the others have put into putting HTML 4.0
together, revised, and put online.

There's just one little thing that doesn't quite look right to me.
Perhaps I am just misinterpreting something ?

In section "5.2.2 Specifying the character encoding",
  http://www.w3.org/TR/REC-html40/charset.html#h-5.2.2
it appears the priorities listed are reversed --
-- it seems to me they should be

1. A META declaration with "http-equiv" set to "Content-Type" and a value
set for "charset".
2. An HTTP "charset" parameter in a "Content-Type" field.
3. The charset attribute set on an element that designates an external
resource.

In other words, if a web server gives a header like
  Content-Type: text/html; charset=ISO-8859-1
but the text of the document itself says
  <META http-equiv="Content-Type" content="text/html; charset=EUC-JP">
it seems to me that the author of the document is more likely to know what
the proper encoding is, and therefore the HTML user agent should render this
document in Japanese.

However, the current HTML 4.0 specification seems to indicate that
the HTML user agent "must" render it in ISO-8859-1.

Rationale: if a particular document changes languages
(not that this happens very often),
the author of the document is the first to know about it (and is unable to
change anything but (1)),
the system administrator is the next to know about it (and is the only one
capable of changing (2)),
and other authors (elswhere on the web that refer to that document)
are usually the last to know about it(and are the only ones capable of
changing (3)).

By the way, I'd like to thank whoever pushed for hexadecimal character
references.
They make things much simpler for me.

--
+ David Cary "mailto:d.cary@ieee.org" "http://www.rdrop.com/~cary/"
| Future Tech, Unknowns, PCMCIA, digital hologram, <*> O-

Received on Sunday, 11 January 1998 01:57:07 UTC