Re: SVGWG SVG-in-HTML proposal (ISSUE-41, ISSUE-37) from Henri Sivonen on 2008-07-30 (public-html@w3.org from July 2008)

From: Henri Sivonen <hsivonen@iki.fi>
Date: Wed, 30 Jul 2008 13:11:04 +0300
To: Charles McCathieNevile <chaals@opera.com>
Cc: Erik Dahlström <ed@opera.com>, HTML WG <public-html@w3.org>
Message-Id: <E9BD90CB-7D25-4D89-AE4B-1E333A269FEB@iki.fi>
On Jul 30, 2008, at 03:11, Charles McCathieNevile wrote:

> On Tue, 29 Jul 2008 12:18:19 +0200, Ian Hickson <ian@hixie.ch> wrote:
>> Only a radical shift in the way the
>> Web works in the intervening five years would affect this conclusion.
>
> Or a radical shift in what we do with HTML - such as trying to  
> include an existing body of work that doesn't have the same kind of  
> legacy.

What the body of work is being included in does have a legacy to take  
into account, though.

> Whatever one thinks of "Draconian Error Handling" (and however you  
> choose to define it) or namespaces, it seems clear today that they  
> basically do not work in generic HTML content (in which  
> generalisation I actually include what people think of as XHTML 1.X  
> content, including things like XHTML-MP), and that they basically do  
> work in pretty much any other XML language. So if we try to bring  
> these things together, we are effectively walking into the radical  
> shift that Ian asks for.

It is possible, though, that SVG would be deployable by a broader  
group of people if it were available without the traits that make  
XHTML very hard for people to deploy.

> The earlier HTML proposal would require a massive amount of re- 
> engineering from the SVG community, since it would produce a huge  
> inconstistency with more or less all existing tools.

I disagree with this characterization. The earlier proposal, by  
design, works with output (apart from <font>) from popular tools  
(Illustrator and Inkscape). As for input, the earlier proposal is  
contained into the parsing layer, so an HTML5 parser can be added side- 
by-side with an existing XML parser without touching code above the  
parser layer.

Note that SVG consumers that currently don't ingest SVG wrapped in  
HTML would need a new parser also under the SVG WG proposal, unless  
the user manually extracts the SVG fragment first.

> Opera believes that incorporating SVG (and also MathML) handling in  
> such a way that they are basically maintained as namespace-compliant  
> XML vocabularies with common XML parsing, while HTML is treated with  
> the handling of sloppy legacy that it requires, is in fact simpler,  
> easier to implement and more likely to have a successful outcome  
> than undertaking the effort required to completely rebuild the tool  
> chains for those languages, from how-to books to authoring tools to  
> browsers.

I disagree that it is simpler. Just by comparing the two proposals,  
it's pretty clear that the commented out proposal is simpler, since it  
adds well-contained processing on start tag token and the CDATA  
section tokenizer states while the SVG WG proposal adds a whole XML  
parser (so far with underdefined integration rules).

I disagree that it is easier to implement (at least if the baseline  
one starts with is the HTML5 parsing algorithm without foreign content  
support). Again, the commented out proposal (without optimizations)  
requires a fairly small chunk of new code. The SVG WG proposal  
(assuming an off-the-shelf XML parser) requires much more bookkeeping  
and code for driving the XML parser. However, it is clear that feeding  
an XML parser a character (or a byte? the proposal isn't clear) at a  
time isn't an reasonable implementation performance-wise. A serious  
implementation of the SVG WG proposal would have to merge an XML  
tokenizer into the HTML tokenizer. There's no way that could be  
considered easier than the fairly contained changes required by the  
commented out proposal.

Also, I think the forgiving nature of HTML and absence of complexity  
like Namespaces have been important for the success of HTML. I don't  
see why leveraging these traits wouldn't be good for the success of  
SVG-the-DOM-language.

> We have multiple parsers working on the same content already, for  
> mixtures of HTML, CSS, Javascript, and XML, and at least three  
> browsers have an actual SVG implementation based on the XML version  
> of SVG, with MS having something similar (VML) although not based on  
> a standard. Passing things from one parser to another is something  
> we have been doing for more than a decade. We don't think it is a  
> big issue to do this. Authoring tools, likewise, have been mixing  
> these languages with the various requirements, or specialising in  
> one area or another. A substantial amount of work has been done on  
> how these things work together, and on teaching people the  
> differences between authoring HTML and authoring (any XML language).

Parsing <style> and <script> element content as CSS and JS  
respectively is fundamentally different from what the SVG WG is  
proposing here.

The HTML layer of tokenizing a the contents of a <style> or a <script>  
element is relatively simple and performant. Most of the time, you  
just check if the current character is '<' (or if in a pseudo-comment,  
'-') and spin in one loop. Performance-wise, this is different from  
running two layers of more complex tokenization on the same input text.

But more importantly, the CSS parsing and the JS parsing happen  
atomically from the point of view of the HTML parser and scripts that  
are executing, and the CSS and JS do not contribute their abstract  
syntax tree nodes into the DOM in a way that would affect the HTML  
layer tree builder. In the SVG case, both HTML and SVG share the DOM  
tree, and both can contribute scripts that have to be run in a well- 
defined way relative to parsing and that can cause a change to the  
parser state.

As for VML, the detailed workings of Trident in the case of non-HTML  
markup in text/html have not been recounted on this mailing list. Does  
Trident really reparse the source text range occupied by VML as XML?

> We do not deny that there is more work to be done, but we feel that  
> it is valuable to proceed in a way that preserves the existing SVG  
> and MathML ecosystems while enabling richer content to be  
> incorporated into HTML documents. We see it as a backward step to  
> try and make HTML itself a super-format that effectively defines  
> everything, and redefines existing things like SVG and MathML.

Enabling SVG in text/html necessarily disrupts the SVG ecosystem in  
the sense that existing viewers that don't accept text/html or HTML  
markup before the SVG fragment won't be able to consume SVG-in-text/ 
html content. (Existing versions of SVG-enabled full-stack browser  
engines such as Gecko, WebKit and Opera Core 2 may be able to be  
compatible with an HTML5 parser injected as a JS library.) This  
disruption happen regardless of whether the SVG subtree is parsed  
according to the commented out proposal or the SVG WG's proposal.

Of course, the justification for the disruption is seeking to make SVG- 
the-DOM-language more successful that it would be if confined to XML.

> Sure. But the first proposal for how to do this was, in Opera's  
> view, unacceptably damaging to the SVG ecosystem. Something along  
> the lines of the current SVG proposal is, in Opera's view, easier to  
> implement, maintains the integrity of the SVG toolchain and  
> ecosystem as well as that of existing HTML, and ths provides the  
> right path where innovation doesn't break backwards compatibility in  
> any significant area, and allows for clean development moving forward.

What's a significant area where backwards compatibility isn't broken  
equally by both proposals? If you serve content authored for the SVG  
WG's proposal to Firefox 3, Opera 9.5 or Safari 3.1 as text/html  
(without injecting an new parser as a JS library), will it render as  
vector graphics?

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/
Received on Wednesday, 30 July 2008 10:11:47 UTC