- From: Jungshik Shin <jshin@i18nl10n.com>
- Date: Mon, 23 Aug 2004 12:40:18 +0900
- To: Tex Texin <tex@i18nguy.com>
- CC: www-international@w3.org
Tex Texin wrote: Hi Tex, > With respect to user agents reparsing documents from the beginning, can you say > which ones do this? Mozilla does and apparently MS IE does, too. Otherwise, it wouldn't be able to handle some html documents I came across with 'meta' rather 'deep' inside the document with non-ASCII characters (in CSS font specification and title) before that. > They are not obligated to and the wording of the standards implies that the > encoding "switch" from the initial value to the value specified in the charset > statement, occurs at the point the statement is parsed. I have yet to check the spec. about this. Even though they're not obligated to, practically they have to because I've seen quite a lot of documents with 'meta charset' buried deep inside with non-ASCII characters before it. Needless to say, I frown upon those documents, but I couldn't track down every one of them. > On a separate point I wonder if you meant ASCII-compatible or simply ASCII. I meant 'ASCII-compatible' (not pure ASCII). A couple of months ago, I submitted a patch to Nutch (an open source crawler/search engine) to parse the first 4(?) kB of html documents to find 'meta charset' declaration assuming they're in Windows-1252 (nothing special about Windows-1252 other than that octets between 0x80 - 0xaf are valid as well as 0xb0 through 0xff). Mozilla does something similar. Jungshik
Received on Monday, 23 August 2004 03:44:02 UTC