- From: Henri Sivonen <hsivonen@iki.fi>
- Date: Mon, 23 Jun 2008 12:25:35 +0300
- To: anthony.grasso@cisra.canon.com.au
- Cc: www-svg <www-svg@w3.org>, "public-html@w3.org WG" <public-html@w3.org>
(Replying to a message from public-svg-wg: http://lists.w3.org/Archives/Public/public-svg-wg/2008AprJun/att-0142/SVG-in-HTML-proposal.html I gather that I'm not supposed to post there, so I'm posting to www- svg instead. CCed to public-html for HTML parsing relevance.) > This document addresses some of the issues that need to be taken > into consideration when creating a hybrid design. It also proposes a > potential, high-level solutions for some of those problems. > > HTML/SVG Markup Integration Considerations > • Interleaved Parsing of HTML/SVG > • JavaScript context sharing > • JavaScript node access > • CSS integration > • SVG Widgets > • Focus Navigation > • Unified API (IDL) I think this list omits an important point: DOM tree sharing. In the top 3 browser engines that implement SVG (Gecko, WebKit and Presto), the layout engine contains an SVG renderer and a CSS formatter, and the SVG and CSS rendering contexts can nest using a single shared DOM tree as the source of the data to be formatted. > As the reader is probably already aware, HTML and SVG are both > markup languages. SVG follows the strict XML rules and HTML allows > for a looser syntax, like tags not been closed, etc. > Considering the dominant design in browsers that support SVG natively, I think focusing on syntax differences isn't the crux of the matter but focusing on them both being DOM languages is. > To accommodate the complexity of integrating both, SGML & XML type > languages, SVG WG proposes a use of a cascaded parser. Cascading > parser allows recursive nested invocations between SGML and XML > Content Handler. > I'd prefer references to HTML directly instead of references to SGML, since implementations don't implement HTML as an SGML language. > Since XML already defines how XML languages from different > namespaces interact with each-other, it is valid to have one XML > language embedded in another. > More to the point, XML defines unambiguous parsing in that case, but doesn't specify interaction further than that. > When embedding HTML inside SVG, the HTML markup must be well formed. > I think requiring HTML to be well-formed inside or outside SVG in text/ html is a bad requirement, since it would defeat a part of the value proposition that motivates SVG in text/html in the first place: that you add SVG without having to turn your HTML into well-formed XHTML as turning existing HTML into well-formed XHTML would be too expensive compared to the benefit of getting some added SVG. > Cascading parser is combination of parsers, cascaded together to > process content of a different type – HTML and XML for example. The > individual parsers have their own rules, specific to the way the > markup language is structured. All of the individual parsers need to > also follow common rules of interactions with the other parsers. The figure shows JavaScript and CSS parsers as examples of cascading parsers. I think SVG-in-HTML shouldn't be likened to CSS-in-HTML. In the case of SVG-in-XHTML, there's one parser that produces one parse tree that contains both SVG and XHTML nodes. CSS-in-XHTML, on the other hand, goes through a second level of parsing through a CSS parsing. I think it would be natural to have the same kind of parsing layering in text/html: to handle SVG-in-HTML in one parser with shared DOM output and having a second level of parsing for CSS. > A Content Handler is a logical component, which understands and > processes particular content type. > > Historically, the Content Handler are called Content plug-ins. The > interface is built for most plug-ins to only allow for content > inclusion by reference, i.e. no real content integration. For > example, displaying SVG from a plug-in in a WEB browser window, does > not make it integrated SVG Content Handler of HTML. Only if elements > of SVG can be interleaved with HTML content, then it can be > considered integrated. For example, it should be possible for an SVG > image and path elements to be displayed as elements of HTML table. > Another example is SVG text on path can be overplayed on top of HTML > Image object. From user visual perspective the content behaves as a > single markup language. > Considering that the dominant design for SVG support in browsers that support SVG out of the box is not based on a plug-in model, and those browser are three of the top four browsers, I think the design for SVG- in-HTML shouldn't assume a plug-in architecture but instead should assume the software architecture of Gecko, WebKit and Presto, where both the CSS formatter and the SVG renderer are built-in parts of the layout engine. > Additionally, if the Content Handler supports progressive parsing, > it may process the content in a different way to that of a non- > progressive parser. This is a rather un-specific statement. I think it is desirable for the mapping from a byte stream to a tree not to depend on progressive rendering. So the input should get the same processing with or without incremental rendering, and the only exception should be that with incremental rendering intermediate tree states can be rendered. > HTML and XML have a very similar way of tokenizing. However, the > tokeniser must maintain information about token casing. > Does this imply that HTML and SVG parts share the tokenizer? Earlier it seemed that the main objection to Hixie's SVG-in-HTML proposal was that it used the same tokenizer for both. The parsing algorithm gets at least two nice properties when the tokenizer is not case-preserving as in Hixie's proposal: 1) Being in an SVG subtree doesn't affect the matching of the start tag and end tag pair, so with bogus cargo cult input, the resulting tree shape remains closer to what an SVG-unaware HTML parser would produce. 2) In the case of a well-known tag name, a tokenizer buffer range containing an element name can be mapped to a statically allocated pre- interned descriptor object in the tokenizer without regard to tree builder state and without intermediate string object creation. (Each camelCase name in known in advance and the descriptor objects hold also the camelCase name.) > Element Identification > Element identification is a very important step in the cascading > parser schema. > > Cascading parser maintains tables of: > > • Element tags – element constructor routines pairs (regular > parsers also have such table); > • Reference to Content Handler, which maintains "Element tag - > Content Handler" pairs table; > • Namespace mapping table (for XML based parsers); > The correspondence between a tagname and a handler is not > necessarily one-to-one. For example: "font", "video", "a". The > context in which the tag is encountered is also responsible for > deciding which handler to use. This means the mapping from nsURI,local pairs onto classes that implement the DOM nodes, right? (i.e. the table needed for createElementNS.) > When a HTML parser encounters an unrecognized element during the > element identification step, it does the following: > > • > Terminates the current element or attribute tag – For example it > terminates P, LI, TR, TH, etc. elements. > > This may not be entirely accurate. There are many quirks to how > broken content is repaired. This is suggesting that svg elements are > recognized and parsed anywhere inside HTML. It should be pointed out > that XML namespaces apply too. However, this does not solve the > element context problem. The DOM and ecma script contexts must be > the same, otherwise I see no gain compared to inclusion-by-reference > (i.e. <object data="some.svg">). This part of the proposal seems to need more work. > • > If a Content Handler that can handle the element is found, the HTML > parser passes the control of the file or token stream to the Content > Handler. Based on the implementation. The HTML parser may also need > to pass JS context and DOM branch reference to the identified > Content Handler. > Why is the control of the token stream transferred here? Transferring the control of the token stream seems awfully complex compared to a solution where there is one tree builder and the representation of the token stream is a private implementation detail of the one tokenizer and the one tree builder. "Based on implementation" seems dodgy. Passing a reference to a DOM branch as opposed to the stack of the HTML parser is dangerous when scripts execute during parsing and can go changing branches of the DOM. What happens if a script invokes document.write() on the JS script context when the SVG Content Handler is consuming tokens? > • > The newly identified Content Provider, > Content Provider? > starts the processing of (SVG) elements until it encounters > unrecognized element error or a namespace, which it can not handle. > The SVG Content Handler invokes the initial Content Handler in an > attempt to identify the unrecognized element with another Content > Handler. > > What is "an unrecognized element"? This may not be clear enough. > Consider for example <fie:foo xmlns:fie="http://www.foo.com/fooml"/>. Are well-known HTML elements that the SVG Content Handler itself can't handle and unknown elements which might be from a future version of SVG handled differently from each other? > • > If the embedded Content Handler returns with error and the content > state is not recognized, the HTML parser will thread the stream from > this point on as malformed element and should try to identify the > first valid HTML element or a valid token and recover the normal > parsing flow. > How? > In cascaded parsing, when one Content Handler passes the JavaScript > context to next level below, it is up to the new Content Handler to > continue using the existing parent's JavaScript context or create a > new one. > > Additionally, the parent Content Handler may decide not to pass a > valid JavaScript context to the embedded Content Handler. In this > case the embedded Content Handler is forced to create a new > JavaScript context, if needed. > When would the parent not pass the context? When would the child handler create its own context even though it got a context from the parent? Does a new script context mean having a window object that isn't the same window object as the one seen from HTML script elements? In the case of SVG-in-XHTML, all the scripts see the same window, right? > If an unified DOM3 interface is exposed from the embedded Content > Handler, it would be possible for the parents JavaScript context to > traverse the child nodes of the embedded content. If this is not > supported, the parent's JavaScript content should be allowed to at > least see the top-level content node (SVG element for example and > its attributes in the SVG content provider case). > Surely a spec can't leave something as fundamental as access to the full DOM tree on the level of "if this is not supported" especially when in the SVG-in-XHTML case access to the full DOM already works. > A particular implementation may decide to pass the CSS processing > rules to the child Content Handler. However, it is better to pass > this information during re-flow and rendering phase of the content > processing. > I'd expect CSS to propagate in the document tree in the same way it does with the SVG-in-XHTML case. > <form> <table border="1" align="center"> <caption><em>A test table > with SVG elements</em></caption> <tr> <svg id="svgRoot" > version="1.2" baseProfile="tiny" viewBox="-160 -120 480 640" > width="480" height="640" stroke-miterlimit="2" zoomAndPan="enable" > xml:space="preserve" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink > " xmlns:xe="http://www.w3.org/2001/xml-events"> <g fill- > opacity="0.7" stroke="black" stroke-width="0.2"> <circle fill="red" > cx="100" cy="100" r="100"/> </g> </svg> </tr> <tr> <svg id="svgRoot" > version="1.2" baseProfile="tiny" viewBox="-160 -120 480 640" > width="480" height="640" stroke-miterlimit="2" zoomAndPan="enable" > xml:space="preserve" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink > " xmlns:xe="http://www.w3.org/2001/xml-events"> <g fill- > opacity="0.7" stroke="black" stroke-width="0.2"> <circle > fill="green" cx="100" cy="100" r="100"/> </g> </svg> </tr> > <tr> <svg id="svgRoot" version="1.2" baseProfile="tiny" > viewBox="-160 -120 480 640" width="480" height="640" stroke- > miterlimit="2" zoomAndPan="enable" xml:space="preserve" xmlns="http://www.w3.org/2000/svg > " xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xe="http://www.w3.org/2001/xml-events > "> <g fill-opacity="0.7" stroke="black" stroke-width="0.2"> <circle > fill="blue" cx="100" cy="100" r="100"/> </g> </svg> </tr> </table> </ > form> I think it doesn't make sense to put <svg> as a child of <tr> without a <td>. > Example 4: SVG Widget Structure - svgDoc > Isn't this a new concept that goes beyond achieving SVG embeddability parity with XHTML? > Remove hard coded table of case fixes for svg/mathml elements and > attributes (if still in the spec). > The tables make the algorithm more robust against bogus input, as </ TiTlE> closes <title> regardless of parser SVG-awareness. The only disadvantage of the case fixing tables is that supporting future SVG elements that have camelCase names require amendments to the table. Amending the table is a much much smaller operation than implementing rendered support for a new feature, so implementation-wise avoiding future amendments to the table would be the wrong optimization as mitigating the undesirable performance impact of deferring case folding is a more complex implementation task than adding an entry for a new camelCase name. (Alternatively, of course, the SVG WG could simply decide to only introduce lower-case names from now on.) > Create a new XML parser. > Putting an XML parser in there is pretty serious complexity. I very much against introducing this kind of complexity. The level of HTML parser complexity dictated by existing content is bad enough as it is. We *really* don't need new complexity where it can be avoided. Here it can be avoided by using the HTML tokenizer and making small amendments to the HTML tree builder. > Set the encoding to the character encoding used by the HTML parser. > Making the HTML parser and the XML parser decode the byte stream independently would pretty much poison any performant buffering scheme. -- Henri Sivonen hsivonen@iki.fi http://hsivonen.iki.fi/
Received on Monday, 23 June 2008 09:26:21 UTC