- From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
- Date: Tue, 6 Dec 2011 05:54:03 +0100
Boris Zbarsky Mon Dec 5 19:18:10 PST 2011: > On 12/5/11 9:55 PM, Leif Halvard Silli wrote: >> I said I agreed with him that Faruk's solution was not good. However, I >> would not be against treating <DOCTYPE html> as a 'default to UTF-8' >> declaration > > This might work, if there hasn't been too much cargo-culting yet. Data > urgently needed! Yeah, it would be a pity if it had already become an widespread cargo-cult to - all at once - use HTML5 doctype without using UTF-8 *and* without using some encoding declaration *and* thus effectively relying on the default locale encoding ... Who does have a data corpus? Henri, as Validator.nu developer? This change would involve adding one more step in the HTML5 parser's encoding sniffing algorithm. [1] The question then is when, upon seeing the HTML5 doctype, the default to UTF-8 ought to happen, in order to be useful. It seems it would have to happen after the processing of the explicit meta data (Step 1 to 5) but before the last 3 steps - step 6, 7 and 8: Step 6: 'if the user agent has information on the likely encoding' Step 7: UA 'may attempt to autodetect the character encoding' Step 8: 'implementation-defined or user-specified default' The role of the HTML5 DOCTYPE, encoding wise, would then be to ensure that step 6 to 8 does not happen. [1] http://dev.w3.org/html5/spec/parsing#encoding-sniffing-algorithm -- Leif H Silli
Received on Monday, 5 December 2011 20:54:03 UTC