Liam Quinn wrote: > On Wed, 7 Mar 2001, Thanasis Kinias wrote: [snip] > > The default > > charset is UTF-8, which is identical to ISO Latin-1 (ISO 8859-1). > There is no default charset for HTML, and UTF-8 is not identical to > ISO-8859-1. UTF-8 and ISO-8859-1 are only identical for the 7-bit > (US-ASCII) characters. From the HTML 4.01 recommendation (<http://www.w3.org/TR/html4/charset.html#h-5.2.2>): > The HTTP protocol ([RFC2616], section 3.7.1) mentions ISO-8859-1 as a default > character encoding when the "charset" parameter is absent from the > "Content-Type" header field. In practice, this recommendation has proved > useless because some servers don't allow a "charset" parameter to be sent, and > others may not be configured to send the parameter. Therefore, user agents must > not assume any default value for the "charset" parameter. I guess the latter half of this means you really ought to specify the charset for HTML. I've been working with XHTML so I forgot HTML was different. I should have been more clear about saying UTF-8 = ISO Latin-1; I meant for the lower-128. Of course, you are correct; they are not identical above U+007F. > The charset declaration is required for HTML documents, regardless of > whether you use entities. If the server properly sends the charset parameter, the <meta> declaration of charset is redundant. From HTML 4.01: > To address server or configuration limitations, HTML documents _may_ include > explicit information about the document's character encoding; the META > element can be used to provide user agents with this information. [emphasis added] If one is only using ASCII characters and the server is sending a charset value in the header Content-Type field (whether it's sending UTF-8, Latin-1, or Windows 1252), all is OK vis-à-vis the standards - unless I'm really misunderstanding "may" in the recommendation. At any rate, there isn't a compelling reason _not_ to specify with a <meta>. And, of course, Bertilo is correct about ISO 8859-1 being preferable to a proprietary standard. Liam also wrote (in response to Bertilo): > But it will cause links containing "#" to fail in IE4 for Windows. So > ISO-8859-1 is still preferred when you don't need characters outside > ISO-8859-1. That's _bizarre_, but I guess not altogether surprising. That answers the question I guess. Is that also a problem with XHTML docs with implicit (default) UTF-8 encoding? On this subject, must one then specify a charset with XHTML docs served as text/html, even if it is the default UTF-8? Thanasis Kinias Information Dissemination Team, Information Technology Arizona State University Tempe, Ariz., U.S.A. Qui nos rodunt confundantur et cum iustis non scribantur.Received on Wednesday, 7 March 2001 16:01:50 UTC
This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 22:58:20 UTC