[Bug 9659] Initial U+0000 should not set frameset-ok to "not ok"

http://www.w3.org/Bugs/Public/show_bug.cgi?id=9659





--- Comment #11 from Adam Barth <w3c@adambarth.com>  2010-09-10 21:12:10 ---
WebKit does what the spec says:

http://trac.webkit.org/browser/trunk/WebCore/html/parser/HTMLTreeBuilder.cpp#L2491

If any replacement characters arrive at the tree builder, they don't set
framesetOk to false.  I don't understand the issues you're complaining about. 
It doesn't matter how the replacement characters were generated.  They just no
longer flip the framesetOk bit.

>It seemed to me that WebKit is discarding U+0000 in
> text content in states other than "in text" and "in foreign content" just like
> Firefox.

WebKit's null swallowing behavior is meant work as follows:

If the tree builder is NOT in the TextMode or InForeignContentMode and the
tokenizer IS in the DataState, RCDATAState, RAWTEXTState, PLAINTEXTState, then
null characters are ignored.

http://trac.webkit.org/browser/trunk/WebCore/html/parser/HTMLTreeBuilder.cpp#L473
http://trac.webkit.org/browser/trunk/WebCore/html/parser/HTMLTokenizer.h#L150
http://trac.webkit.org/browser/trunk/WebCore/html/parser/HTMLTokenizer.h#L204

> Firefox has a dedicated token for U+0000, so the communication is
> unidirectional when the special token is passed to the tree builder. The tree
> builder decides whether to emit U+FFFD or to discard the token.

That's fine.  We just discard the token in the InputStreamPreprocessor to avoid
extra calls to memcpy.

> Spec-wise, I'm suggesting not having "preprocessing the input stream" as a step
> before tokenization but letting the tokenizer see U+0000. The implementation in
> Gecko has always shown U+0000 and carriage return to the tokenizer.

We just let the InputStreamPreprocessor look at this bit of the parser's state.

-- 
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.

Received on Friday, 10 September 2010 21:12:13 UTC