- From: Henri Sivonen <hsivonen@iki.fi>
- Date: Mon, 28 May 2007 17:28:06 +0300
For proper communication it is important that a document is decoded reliably. In addition, for security reasons, it is important that documents are decoded the same way by browsers and by gatekeeper tools. To this end, I think at least for conforming documents the algorithm for establishing the character encoding should be deterministic. I'd like to request two things: 1) When sniffing for meta charset, the current draft allows a use agent to give up sooner than after examining the first 512 bytes. To make meta charset sniffing reliable and deterministic so that it doesn't depend on flukes in buffering, I think UAs should (if there's no transfer protocol level charset label and no BOM) be required to consumer bytes until they find a meta charset, reach the EOF or have examined 512 bytes. That is, I think UAs should not be allowed to give up earlier. (On the other hand, I think UAs should be allowed to start examining the byte stream before 512 have been buffered without an IO error, since in general, byte stream buffer management should be up to the IO libraries and outside the scope of the HTML spec.) 2) Since the chardet step is optional and the spec doesn't make the Mozilla chardet behavior normative, I think the document should be considered non-conforming if the algorithm for establishing the character encoding proceeds to steps 6 (chardet) or 7 (last resort default). Personally, I'd prefer formulating such a document conformance requirement as part of the algorithm, but given recent feedback, I am aware that most people wish to maintain a separation of processing models and document conformance requirement. (For me, mixing the two is what I have to do in my head anyway. ;-) It wouldn't hurt, though, to say in the section on writing documents that at least one of the following is required for document conformance: * A transfer protocol-level character encoding declaration. * A meta charset within the first 512 bytes. * A BOM. -- Henri Sivonen hsivonen at iki.fi http://hsivonen.iki.fi/
Received on Monday, 28 May 2007 07:28:06 UTC