- From: Sam Ruby <rubys@intertwingly.net>
- Date: Wed, 25 Mar 2009 01:51:41 -0400
- To: Jonas Sicking <jonas@sicking.cc>
- CC: Maciej Stachowiak <mjs@apple.com>, Doug Schepers <schepers@w3.org>, Ian Hickson <ian@hixie.ch>, public-html@w3.org, www-svg <www-svg@w3.org>
Jonas Sicking wrote: > On Wed, Mar 18, 2009 at 1:24 AM, Jonas Sicking <jonas@sicking.cc> wrote: >> On Mon, Mar 9, 2009 at 5:15 PM, Sam Ruby <rubys@intertwingly.net> wrote: >>> Jonas Sicking wrote: >>>> Personally I would like to see something that is even more HTMLy than >>>> Hixies current proposal. I don't like at all that we have to use a >>>> different tokenizer in "HTML mode" and in "foreign content mode". This >>>> is both confusing to web developers and painful for end users (as >>>> performance and code complexity suffers). >>> Do you (or Henri) have a concrete proposal to offer? >> The cases where I can see that the parser state is affecting the >> tokenizer state is the following: >> >> CDATA handling. <![CDATA[]]> is currently only allowed in foreign >> content. It would be great if we could allow <![CDATA[]]> consistently >> throughout the markup. It sounds like Opera has done some >> experimentation in this area. >> >> In HTML mode, there are a set of elements that change the tokenizers >> 'content model flag': >> The following elements switch the tokenizer to CDATA state: noscript, >> noframes, style, xmp, iframe, script >> The following elements switch the tokenizer to RCDATA state: title, textarea >> The following elements switch the tokenizer to PLAINTEXT state: plaintext >> >> It would be great if we could allow the same set of tags to affect the >> parser the same way in both HTML mode and in foreign content mode. The >> only two tags that seem troublesome here is <script> and <style>. It >> sounds like it might possibly might be agreement that it would be >> possible to parse <script> as CDATA, which would leave <style> as the >> only remaining controversial tag. >> >> If we made these changes I think there would be some optimizations >> that we could do on the implementation side. However more importantly, >> I think the consistency would be much appreciated by authors. > > Just realized there was one more thing that I forgot about. > > This isn't a case where the tokenizer is directly dependent on the > parser, however it's nonetheless a case that I think will be confusing > for authors. > > Currently in foregin content mode, the 'empty XML element' syntax is > supported. So you can write > > <circle x="42" y="4711" /> > > This is IMHO a good thing. However this syntax does not work in HTML > mode. So for example > > <div id="output" /> > > does not create an empty div, but is rather treated as a start tag. > > This would be an easy problem to solve if it wasn't for web > compatibility concerns. However I'd still like to explore what could > be done in this area. For example of there is a short list of tags for > which we wouldn't support the empty element syntax, or if we could > make empty-element syntax only work in standards mode (I'm not exited > about either option). I'm actually OK with this being the list of always empty tags, as is reflected in the current spec. <br></br> is an extreme case of one you couldn't fix if you wanted to, and <script src=""/> is an example of one that would be really nice if it could be fixed, but alas it can't either. > / Jonas - Sam Ruby
Received on Wednesday, 25 March 2009 05:52:35 UTC