- From: <bugzilla@jessica.w3.org>
- Date: Wed, 17 Aug 2011 22:19:15 +0000
- To: public-html-bugzilla@w3.org
http://www.w3.org/Bugs/Public/show_bug.cgi?id=12576 --- Comment #9 from Ian 'Hixie' Hickson <ian@hixie.ch> 2011-08-17 22:19:14 UTC --- As a general rule, in the future, please file one issue per bug. > 1) In "Before attribute name state" (section 8.2.4.34 right now), on > encountering '<', a new attribute is started with '<' as first character. > Shouldn't this not trigger a new element while reporting a parse error ? > 5) Comment (1), if valid, affects pre-parser logic too (to find encoding). It's an error, exactly what happens doesn't matter so much. I think the current behaviour is more consistent with widespread legacy implementations and is mildly more secure when it comes to XSS attacks. > 2) In "Data state" (section 8.2.4.1 right now), on encountering 'U+0000', the > current input character is emitted. Everywhere else, it is replaced with > U+FFFD. Is this on purpose ? Or a typo ? It's on purposes, the tree construction takes care of it for those cases. > 3) In "Bogus comment state" (section 8.2.4.44 right now), it would be good if > it could be reworded for clarity. As stated, it requires very careful reading > to decipher its meaning. Please file a separate bug for this with more detail about exactly what needs clarifying. In general, very careful reading is to be encouraged. ;-) > 4) In "Bogus comment state" (section 8.2.4.44 right now), if we encounter an > EOF, is it not a parse error ? (it delegates to DATA state, where it is not a > parse error iirc). Once you hit the bogus comment state you've already hit a parse error so it doesn't matter. > 6) In "Determining the character encoding" (section 8.2.2.1 right now), under > step 5 (the algo to find encoding from html content) : > Under sub-step 1, case '<meta', point 12 which currently says - > "If mode is true but got pragma is false, then jump to the second step of the > overall "two step" algorithm." > Here, 'mode' is undefined from what I saw : I assume it is supposed to be 'need > pragma' ? Fixed; see comment 4. > 6.1) In point 13 from same snippet from (6) above, we have : > "If charset is a UTF-16 encoding, change the value of charset to UTF-8." > What if it is explicitly set to utf-16LE or utf-16BE ? Should it be changed too > ? Or only for 'utf-16' ? UTF-16LE and UTF-16BE are both UTF-16 encodings. > 7) In "get an attribute" (#concept-get-attributes-when-sniffing : section > 8.2.2.1 algo in main step 5) : currently a value can end on a whitespace or > '>'. What about '/' ? Currently, the '/' will get added to the value ... This > is applicable in two places in that algo : step 10 and step 11. Could you show a concrete example of a Web page that would be processed differently based on this difference? I don't fully understand the implications here. I'm leaving this bug open for point 7. Please open separate bugs for the other points if the above is not sufficient resolution. -- Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug.
Received on Wednesday, 17 August 2011 22:19:17 UTC