Re: SVG in HTML proposal from Andrew Sidwell on 2008-07-18 (www-svg@w3.org from July 2008)

From: Andrew Sidwell <w3c@andrewsidwell.co.uk>
Date: Fri, 18 Jul 2008 02:09:51 +0100
To: Robin Berjon <robin@berjon.com>
CC: "www-svg@w3.org" <www-svg@w3.org>, public-html@w3.org
Message-ID: <487FED5F.20106@andrewsidwell.co.uk>
(I sent this a few days ago, but only to Robin, and I've only just 
noticed.  Sorry!)

Robin Berjon wrote:
> 
> On Jul 14, 2008, at 16:39 , Andrew Sidwell wrote:
>> It's not that I believe the tokeniser should not be case-preserving; I 
>> just think that if your motivation is just to make weirdly-cased tags 
>> not trigger XML parsing, then that's not a useful route to pursue.
> (...)
>> I understand you want to be compatible with existing SVG content, but 
>> this is a place where you shouldn't be.  <svg xmlns="..."> is quite 
>> enough.
> 
> Something is either compatible, or it's not, and in this case that's 
> not. There may be a case to be made that certain aspects of 
> compatibility could be dropped, but if so we should be clear that that's 
> what we're discussing.
> 
> What is the downside of triggering XML parsing whenever an element name 
> matches 'svg' or '*:svg' insensitively? I don't think that the Web would 
> break, I don't think that going forward developers would be severely 
> surprised on any manner of frequent basis (if ever), and I don't think 
> that the performance impact would be that bad (since it would only 
> trigger false-positives on extremely rare unknown elements). If it turns 
> out afterwards that the casing or namespace were wrong, SVG can handle 
> that by not  displaying anything.

Indeed.  Note also that matching "*:svg" is bad, too; I can't think of
any reason why you would want to use <svg:svg xmlns:svg="..."> over <svg
xmlns="..."> as a way to switch into SVG in text/html.

>> I think that to implement what the SVG WG proposes to a decent level 
>> of performance will require building a new XML parser into the HTML5 
>> parser.  Feeding XML to an XML processor though an API one byte at a 
>> time will slow things down a lot.
> 
> But would you really need to feed it one byte at a time? You can pretty 
> much scan (naïvely) to the next '>' and feed that until either you get 
> an error or the XML subdocument is known to be closed. That's already 
> better than byte by byte and I'm sure better heuristics can be devised.

OK, but there /will/ still be a speed hit, I suspect a significant one.
  (I don't know quite how much, but I intend to have a go at
implementing the proposal to see how well it works in practice.)

>> I much prefer the HTML5 model over having to incorporate an XML parser 
>> as the SVG WG suggests, since XML fragments in text/html are 
>> underspecified, and will be until XML parsing is specified to the 
>> level of HTML5 somewhere.
> 
> I'm not sure what you mean by that?

HTML5 specifies at a byte-by-byte level how to tokenise and build a tree
of an HTML document.  It defines what it means to e.g. "emit a start tag
token", or "insert an element".  XML defines none of this, but rather
what a well-formed XML document looks like, and additionally, how to use
DTDs to determine the validity of such documents.  As a result, the
proposal is underspecified compared to the rest of the parsing
algorithm.  (I suspect the best way of fixing this would be to include
an "XML fragment" parsing section inside of HTML5, which deals with a
subset of XML that doesn't include entities, PIs, and the like.)

>> That said, I don't think the HTML5 model is perfect.  I tend towards 
>> believing that the tokeniser should have a case-preserving flag, which 
>> is flipped when entering "in foreign content", since that saves the 
>> headache of going over all element and attribute names and 
>> case-correcting them.  I understand the SVG WG's concerns, but I don't 
>> think that using an XML parser is the answer.  I think it would be 
>> much more productive if, when HTML5 parsing starts to be implemented 
>> in browsers, people make sure that those browsers allow export of 
>> well-formed XML versions of any foreign content included in them.
> 
> How do you get that WF export without an XML parser of some form?

The SVG, along with the HTML, is built into a DOM by the browser/UA,
which is then trivially serialisable to XML.

Additionally, effort would be well-spent on getting SVG editors to
include HTML5 parsers which can parse and extract SVG fragments
usefully.  After that, it doesn't really matter in what form the SVG is
serialised in text/html, as long as everyone agrees on the rules for
parsing it.

Cheers,
a.
Received on Friday, 18 July 2008 01:12:10 UTC