Re: SVG in HTML proposal from Robin Berjon on 2008-07-15 (www-svg@w3.org from July 2008)

From: Robin Berjon <robin@berjon.com>
Date: Tue, 15 Jul 2008 11:37:09 +0200
To: Andrew Sidwell <w3c@andrewsidwell.co.uk>
Cc: "public-html@w3.org WG" <public-html@w3.org>, www-svg <www-svg@w3.org>
Message-Id: <9410136C-F13E-4668-A6D6-FE9DF776F954@berjon.com>

On Jul 14, 2008, at 16:39 , Andrew Sidwell wrote:
> It's not that I believe the tokeniser should not be case-preserving;  
> I just think that if your motivation is just to make weirdly-cased  
> tags not trigger XML parsing, then that's not a useful route to  
> pursue.
(...)
> I understand you want to be compatible with existing SVG content,  
> but this is a place where you shouldn't be.  <svg xmlns="..."> is  
> quite enough.

Something is either compatible, or it's not, and in this case that's  
not. There may be a case to be made that certain aspects of  
compatibility could be dropped, but if so we should be clear that  
that's what we're discussing.

What is the downside of triggering XML parsing whenever an element  
name matches 'svg' or '*:svg' insensitively? I don't think that the  
Web would break, I don't think that going forward developers would be  
severely surprised on any manner of frequent basis (if ever), and I  
don't think that the performance impact would be that bad (since it  
would only trigger false-positives on extremely rare unknown  
elements). If it turns out afterwards that the casing or namespace  
were wrong, SVG can handle that by not  displaying anything.

It's just a suggestion, I'm curious about its exact cost.

> I think that to implement what the SVG WG proposes to a decent level  
> of performance will require building a new XML parser into the HTML5  
> parser.  Feeding XML to an XML processor though an API one byte at a  
> time will slow things down a lot.

But would you really need to feed it one byte at a time? You can  
pretty much scan (naïvely) to the next '>' and feed that until either  
you get an error or the XML subdocument is known to be closed. That's  
already better than byte by byte and I'm sure better heuristics can be  
devised.

> I much prefer the HTML5 model over having to incorporate an XML  
> parser as the SVG WG suggests, since XML fragments in text/html are  
> underspecified, and will be until XML parsing is specified to the  
> level of HTML5 somewhere.

I'm not sure what you mean by that?

> That said, I don't think the HTML5 model is perfect.  I tend towards  
> believing that the tokeniser should have a case-preserving flag,  
> which is flipped when entering "in foreign content", since that  
> saves the headache of going over all element and attribute names and  
> case-correcting them.  I understand the SVG WG's concerns, but I  
> don't think that using an XML parser is the answer.  I think it  
> would be much more productive if, when HTML5 parsing starts to be  
> implemented in browsers, people make sure that those browsers allow  
> export of well-formed XML versions of any foreign content included  
> in them.

How do you get that WF export without an XML parser of some form?

-- 
Robin Berjon - http://berjon.com/

Received on Tuesday, 15 July 2008 09:37:46 UTC