Re: SVG in HTML proposal from Doug Schepers on 2008-07-17 (www-svg@w3.org from July 2008)

From: Doug Schepers <schepers@w3.org>
Date: Thu, 17 Jul 2008 14:25:29 -0400
To: Andrew Sidwell <w3c@andrewsidwell.co.uk>
Cc: Erik Dahlström <ed@opera.com>, public-html@w3.org, "www-svg@w3.org" <www-svg@w3.org>
Message-ID: <487F8E99.3060301@w3.org>
Hi, Andrew-

Thanks for your detailed comments.  We're glad to see that you are 
implementing SVG in HTML.

As Erik noted, we are updating the proposal continually based on 
feedback.  We would appreciate if you would make sure that we've 
incorporated your feedback correctly.

http://dev.w3.org/cvsweb/SVG/proposals/svg-html/svg-html-proposal.html.diff?r1=text&tr1=1.1&r2=text&tr2=1.14&f=h

Thanks-
-Doug

Andrew Sidwell wrote (on 7/14/08 10:39 AM):
> 
> Hello,
> 
> Erik Dahlström wrote:
>> Hello HTML WG,
>>
>> The SVG WG is happy to announce the first draft proposal for how to 
>> handle
>>  SVG in HTML (see attachment).
> 
> I've recently been writing an HTML5 parsing library in C (hubbub)[1] and 
> have implemented MathML and SVG as written in that spec (with SVG 
> handling as it is in commented-out portions of that spec).  Having read 
> this proposal in full, I have a number of technical comments:
> 
> 
> 1. Making the tokeniser case-preserving doesn't help.
> 
> You go to quite a bit of effort to allow the tokeniser to preserve case 
> and to have the treebuilder lowercase HTML elements then inserted, I 
> assume so that authors can't write '<SVG xmlNS="...">' and have it work. 
>  However, given that you haven't made the tokeniser not handle "<svg 
> xmlns=http://...>", and the like, it seems like a fairly pointless 
> change.  If everything from the first angle bracket gets passed to an 
> XML processor, then '<SVG xmlNS="">'/"<svg xmlns=http://..." won't work 
> anyway, since the XML processor will either misnamespace or choke.
> 
> It's not that I believe the tokeniser should not be case-preserving; I 
> just think that if your motivation is just to make weirdly-cased tags 
> not trigger XML parsing, then that's not a useful route to pursue.
> 
> 
> 2. Requiring "A start tag whose case-sensitive tag name is "*:svg" that 
> has a case-sensitive attribute "xmlns:*" with the value 
> "http://www.w3.org/2000/svg", where '*' can be any string as long as 
> it's the same in both the tagname and the xmlns attributename:" is bad; 
> it adds too much complexity for little gain.
> 
> Hubbub and the Java parser behind Validator.nu both do not do string 
> comparisons when dealing with lists of elements.  Instead, they hash the 
> element name and then just compare hashes from then on.  (This is 
> obviously a massive performance gain.)  The requirement above hurts this 
> by forcing a string comparison on the name, and then in certain cases 
> forcing one to look through all the attributes of an element and perform 
> string comparisons on their names and values too.
> 
> The spec to date has gone to effort to avoid making implementations 
> search through attributes, because it is slow.  As far as I can 
> remember, there is one place that attributes are checked in the 
> treebuilder, and that is <input type="hidden"> in the "in table" phase.
> 
> I understand you want to be compatible with existing SVG content, but 
> this is a place where you shouldn't be.  <svg xmlns="..."> is quite enough.
> 
> 
> 3. There are various problems with the text of the algorithm for parsing 
> XML fragments.
> 
> The lines:
> "Save the tokeniser content model flag to old-state."
> "Reset the tokeniser content model flag to the old-state."
> 
> are superfluous.  At no point in the course of parsing XML fragments is 
> the tokeniser content model changed, so this text serves no purpose.
> 
> I was under the impression that an off-the-shelf XML processor should be 
> able to be used to parse SVG-in-text/html.  If this is the case, the 
> requirement "For each element that is successfully parsed, the XML 
> parser must insert a foreign element." should probably be changed to 
> "For each element the XML parser parses, insert a foreign element with 
> the namespace, name, and attributes of that element", or the like, to 
> avoid mandating that the XML parser must have behaviour that is not 
> specified in the XML spec.  In general, I think the algorithm should 
> specify what to do with things that the XML parser parses and not that 
> e.g. the XML parser must do something.
> 
> Handling is not specified for what happens if an XML parser parses 
> characters or processing instructions, and nothing is said about empty 
> tags (basically that they should insert a new element and then pop that 
> element off the stack).
> 
> The sentence "Feed the XML parser the string corresponding to the start 
> tag of the element along with all its attributes." is unclear.  I 
> believe the intention is closer to: "Feed the XML parser the string 
> starting with the character that triggered entry into the 'tag open' 
> state and ending with the character that triggered emittance of the 
> start tag token."
> 
> 
> 
> 
> My non-technical comments:
> 
> I think that to implement what the SVG WG proposes to a decent level of 
> performance will require building a new XML parser into the HTML5 
> parser.  Feeding XML to an XML processor though an API one byte at a 
> time will slow things down a lot.
> 
> I much prefer the HTML5 model over having to incorporate an XML parser 
> as the SVG WG suggests, since XML fragments in text/html are 
> underspecified, and will be until XML parsing is specified to the level 
> of HTML5 somewhere.  Even when it is, I don't think there is a place for 
> draconian error handling in text/html; it goes against the very grain of 
> the language.
> 
> That said, I don't think the HTML5 model is perfect.  I tend towards 
> believing that the tokeniser should have a case-preserving flag, which 
> is flipped when entering "in foreign content", since that saves the 
> headache of going over all element and attribute names and 
> case-correcting them.  I understand the SVG WG's concerns, but I don't 
> think that using an XML parser is the answer.  I think it would be much 
> more productive if, when HTML5 parsing starts to be implemented in 
> browsers, people make sure that those browsers allow export of 
> well-formed XML versions of any foreign content included in them.
> 
> 
> Cheers,
> a.
> 
> [1] http://www.netsurf-browser.org/projects/hubbub/
> 
> 
>
Received on Thursday, 17 July 2008 18:26:05 UTC