[whatwg] Internal character encoding declaration

Henri Sivonen wrote:
> If a meta element whose http-equiv attribute has the value 
> "Content-Type" (compare case-insensitively) and whose content attribute 
> has a value that begins with "text/html; charset=", the string in the 
> content attribute following the start "text/html; charset=" is taken, 
> white space removed from the sides and considered the tentative encoding 
> name.

This will need to handle common mistakes such as the following:

<meta ... content="application/xhtml+xml;charset=X">
<meta ... content="foo/bar;charset=X">
<meta ... content="foo/bar;charset='X'">
<meta ... content="charset=X">
<meta ... charset="X">

I'm not sure which browsers support each one, they'll all need to be tested.

> Authors are adviced not to use the UTF-32 encoding or legacy encodings. 
> (Note: I think UTF-32 on the Web is harmful and utterly pointless,

I agree about it being pointless, but why is it considered harmful?

>  I'd like to have some text in the spec that justifies whining
> about legacy encodings.

What are your reasons for whining about legacy encodings and what would 
you like the spec to say?

> Also, the spec should probably give guidance on what encodings need to 
> be supported. That set should include at least UTF-8, US-ASCII, 
> ISO-8859-1 and Windows-1252.

And probably UTF-16 as well.

-- 
Lachlan Hunt
http://lachy.id.au/

Received on Monday, 13 March 2006 06:12:21 UTC