- From: Sebastian Redl <sebastian.redl@getdesigned.at>
- Date: Sat, 03 Jun 2006 17:34:03 +0200
- To: www-html@w3.org
Philip TAYLOR wrote: > This talks specifically about "ASCII-valued bytes", and says nothing at > all about non-ASCII-valued bytes. I went and read the section. It says: > The META <http://www.w3.org/TR/html4/struct/global.html#edef-META> > declaration must only be used when the character encoding is organized > such that ASCII-valued bytes stand for ASCII characters (at least > until the META > <http://www.w3.org/TR/html4/struct/global.html#edef-META> element is > parsed). So let's see. All encodings that are completely incompatible with ASCII, such as all EBCDIC variants, are out. You can't use the meta with them. Now an interesting question raises itself. UTF-16 is mostly organized such that ASCII-valued bytes stand for ASCII characters. ('A' is x0043, and the x43 byte is indeed the ASCII value of 'A'.) How about the x0-bytes though? They don't stand for the ASCII NUL character. Instead, they are part of some other character. My intuition tells me that this means that UTF-16 and all similar encodings are out and Ian's example is simply invalid. I'm not sure here, though. But the real point of discussion was whether the character encoding can change. It cannot. There is only one encoding per document. Although the text does not state this explicitely, the wording builds on this assumption, for example from 5.2.2: > How does a server determine which character encoding applies for a > document it serves? Note the use of the singular. The practical problem here is, what signals the change of the encoding? Is it the end of the meta element, or the end of the content attribute of the meta start tag? Since no such thing is specified, we can safely assume that the character encoding cannot change during the document. Which does not mean the byte mapping cannot. Non-ASCII bytes may appear in the stream prior to the meta (I was wrong here. A sensible implementation would be to store them for later translation), but ASCII bytes must have the ASCII meaning. This is what the phrasing of the text means. The part in the parentheses is about shift encodings such as Shift-JIS, which may, in the initial shift state, have ASCII mapping for ASCII bytes, but after a shift character, have a different mapping. The phrase in the parentheses permits such encodings, but no shift may come before the meta. Sebastian Redl
Received on Saturday, 3 June 2006 15:33:56 UTC