- From: Jonas Sicking <jonas@sicking.cc>
- Date: Tue, 31 Mar 2009 15:08:56 -0700
- To: Henri Sivonen <hsivonen@iki.fi>
- Cc: Simon Pieters <simonp@opera.com>, Doug Schepers <schepers@w3.org>, HTML WG <public-html@w3.org>, www-svg@w3.org
On Tue, Mar 31, 2009 at 2:30 AM, Henri Sivonen <hsivonen@iki.fi> wrote: > On Mar 25, 2009, at 16:24, Simon Pieters wrote: > >> On Thu, 19 Mar 2009 18:52:25 +0100, Jonas Sicking <jonas@sicking.cc> >> wrote: >> >>> My feelings on 1 vs. 2 is: >>> >>> Problems with 1: >>> Parsing <![CDATA[]]> inside a CDATA element "feels" weird. > > I agree that it feels weird. > > I think the biggest problem with this entire issue is that the difference > between HTML <script> and <script> in XML is surprising and unintuitive, so > we will have a surprise boundary somewhere no matter what. It seems on the > general level we have the following options: > > 1) Have the surprise boundary between text/html and XML. (The situation > before SVG-in-text/html) > > 2) Have the surprise boundary between HTML <script> in text/html and > everything else. (The situation with SVG-in-text/html as drafted) > > 3) Have graded surprises with two boundaries: > a) Have a surprise boundary between HTML <script> and SVG-in-text/html > <script> and another between SVG-in-text/html <script> and XML. > b) Have a surprise boundary between pre-HTML5 <script> and HTML5 > text/html <script>s and another between text/html and XML. > > I'm worried about escaping surprises in general having seen the RSS <title> > epic fail. I'm a little unclear as to what the behaviors in 3 are. I.e. which parsing/processing algorithms would lead to the two scenarios you describe? I'm also unclear as to what behavior you are proposing. How do you feel about my proposal in http://lists.w3.org/Archives/Public/public-html/2009Mar/0634.html It would result in a graded surprise where there's some change between HTML <script> parsing between HTML4 and HTML5, and some surprise in the boundry between SVG-in-HTML and SVG-in-XML. >>> Problems with 2: >>> Just stripping a heading and trailing "<![CDATA[" / "]]>" would break >>> markup like: >>> <style> >>> <![CDATA[ >>> rect { fill: yellow; } >>> ]]> >>> <![CDATA[ >>> circle { fill: blue; } >>> ]]> >>> </style> >>> >>> which probably happens occasionally due to copy-n-pasting. > > I don't like this, because it requires going back and modifying buffers that > had been already built instead of just tweaking forward-only tokenizer state > transitions, and it doesn't even work in the case where there are multiple > CDATA sections as shown above. If we end up doing something other than > what's currently in the draft, I'd much rather have what what Simon proposes > as #4. The stripping doesn't happen at a tokenizer stage. It happens after all parsing is done when the inline data is taken from the DOM and passed to the serializer. See the details in the link above. / Jonas
Received on Tuesday, 31 March 2009 22:09:54 UTC