- From: Thanasis Kinias <tkinias@asu.edu>
- Date: Wed, 07 Mar 2001 13:50:05 -0700
- To: "'Liam Quinn'" <liam@htmlhelp.com>
- Cc: www-validator@w3.org, "'Bertilo Wennergren'" <bertilow@hem.passagen.se>
- Message-id: <A021872EC2BDD411AB3600902746A055016047B5@mainex4.asu.edu>
Liam Quinn wrote: > On Wed, 7 Mar 2001, Thanasis Kinias wrote: [snip] > > The default > > charset is UTF-8, which is identical to ISO Latin-1 (ISO 8859-1). > There is no default charset for HTML, and UTF-8 is not identical to > ISO-8859-1. UTF-8 and ISO-8859-1 are only identical for the 7-bit > (US-ASCII) characters. From the HTML 4.01 recommendation (<http://www.w3.org/TR/html4/charset.html#h-5.2.2>): > The HTTP protocol ([RFC2616], section 3.7.1) mentions ISO-8859-1 as a default > character encoding when the "charset" parameter is absent from the > "Content-Type" header field. In practice, this recommendation has proved > useless because some servers don't allow a "charset" parameter to be sent, and > others may not be configured to send the parameter. Therefore, user agents must > not assume any default value for the "charset" parameter. I guess the latter half of this means you really ought to specify the charset for HTML. I've been working with XHTML so I forgot HTML was different. I should have been more clear about saying UTF-8 = ISO Latin-1; I meant for the lower-128. Of course, you are correct; they are not identical above U+007F. > The charset declaration is required for HTML documents, regardless of > whether you use entities. If the server properly sends the charset parameter, the <meta> declaration of charset is redundant. From HTML 4.01: > To address server or configuration limitations, HTML documents _may_ include > explicit information about the document's character encoding; the META > element can be used to provide user agents with this information. [emphasis added] If one is only using ASCII characters and the server is sending a charset value in the header Content-Type field (whether it's sending UTF-8, Latin-1, or Windows 1252), all is OK vis-à-vis the standards - unless I'm really misunderstanding "may" in the recommendation. At any rate, there isn't a compelling reason _not_ to specify with a <meta>. And, of course, Bertilo is correct about ISO 8859-1 being preferable to a proprietary standard. Liam also wrote (in response to Bertilo): > But it will cause links containing "#" to fail in IE4 for Windows. So > ISO-8859-1 is still preferred when you don't need characters outside > ISO-8859-1. That's _bizarre_, but I guess not altogether surprising. That answers the question I guess. Is that also a problem with XHTML docs with implicit (default) UTF-8 encoding? On this subject, must one then specify a charset with XHTML docs served as text/html, even if it is the default UTF-8? Thanasis Kinias Information Dissemination Team, Information Technology Arizona State University Tempe, Ariz., U.S.A. Qui nos rodunt confundantur et cum iustis non scribantur.
Received on Wednesday, 7 March 2001 16:01:50 UTC