SVG (Scalable vector graphics) is a powerful concept of describing graphical elements and animations, based on XML. SVG was originally designed as a presentation only framework (like scalable documents) and slowly becoming more and more UI centric.
Things like user interactivity and animations are making SVG a good candidate for integration in User Interface framework and WEB Content Handler such as HTML in browsers, utilizing its power of flexibility and scalability.
In many cases SVG is used as a stand-alone SVG content, but very often, there is another markup language used as host of SVG. Example of such host markup languages would be HTML, XUL, etc. To correctly integrate with (or embed in) other markup languages, multiple things need to be considered.
This document addresses some of the issues that need to be taken into consideration when creating a hybrid design. It also proposes a potential, high-level solutions for some of those problems.
As the reader is probably already aware, HTML and SVG are both markup languages. SVG follows the strict XML rules and HTML allows for a looser syntax, like tags not been closed, etc.
To accommodate the complexity of integrating both, SGML & XML type languages, SVG WG proposes a use of a cascaded parser. Cascading parser allows recursive nested invocations between SGML and XML Content Handler.
Since XML already defines how XML languages from different namespaces interact with each-other, it is valid to have one XML language embedded in another.
When embedding HTML inside SVG, the HTML markup must be well formed.
Consider adding detail about what happens when HTML is encountered inside the SVG. Parser switch? parser switching back? and at what element does it resume? SVG may need to specify other rules for embedding HTML in SVG.
Example 1: SVG embedded in HTML
<head>
<title>HTML Body of SVG</title>
</head>
<body>
<form>
<svg width="480" height="640" id="svg1"
xmlns="http://www.w3.org/2000/svg" xml:space="preserve" version="1.2" baseProfile="tiny" >
<g fill-opacity="0.7" stroke="black" stroke-width="0.2" >
<circle fill="red" cx="100" cy="100" r="100"/>
</g>
</svg>
</form>
</body>
Eample 2: HTML embedded in SVG
<form>
<svg width="480" height="640" id="svg3" viewBox="-160 -120 480 640" stroke-miterlimit="2"
xmlns="http://www.w3.org/2000/svg" xml:space="preserve" version="1.2" baseProfile="tiny">
<g fill-opacity="0.7" stroke="black" stroke-width="0.2">
<table border="1">
<caption><em>A test table</em></caption>
<tr>
<th rowspan="2"></th><th colspan="2">Average</th>
<th rowspan="2">Red<br/>eyes</th>
</tr>
<tr>
<th>height</th><th>weight</th>
</tr>
<tr>
<th>Males</th><td>1.9</td><td>0.003</td><td>40%</td>
</tr>
<tr>
<th>Females</th><td>1.7</td><td>0.002</td><td>43%</td>
</tr>
</table>
<circle fill="green" cx="100" cy="100" r="100"/>
</g>
</svg>
</form>
Cascading parser is combination of parsers, cascaded together to process content of a different type – HTML and XML for example. The individual parsers have their own rules, specific to the way the markup language is structured. All of the individual parsers need to also follow common rules of interactions with the other parsers.
A Content Handler is a logical component, which understands and processes particular content type.
Historically, the Content Handler are called Content plug-ins. The interface is built for most plug-ins to only allow for content inclusion by reference, i.e. no real content integration. For example, displaying SVG from a plug-in in a WEB browser window, does not make it integrated SVG Content Handler of HTML. Only if elements of SVG can be interleaved with HTML content, then it can be considered integrated. For example, it should be possible for an SVG image and path elements to be displayed as elements of HTML table. Another example is SVG text on path can be overplayed on top of HTML Image object. From user visual perspective the content behaves as a single markup language.
There are multiple steps a content processor needs to go through before a markup stream becomes a visual representation of the input content. The processing steps used are heavily based on the design of the Content Handler. A legacy Content Handler normally goes through more steps than the contemporary SAX based parser processors. For example many legacy Content Handler would tokenize the stream and then start identifying and creating the elements representing the tag elements in different steps.
Additionally, if the Content Handler supports progressive parsing, it may process the content in a different way to that of a non-progressive parser.
Regardless of the type of the parser the Content Handler is using, the elements need to be identified from the markup stream, created and displayed.
These are the major content processing steps:
Based on the implementation and the type of Content Handler, some of the steps (listed above) can be combined in a single operation and some of these can be broken down into more steps.
Note that Element Identification is an important process in the cascading parsing schema.
The purpose of the Tokenizing step is to break the stream down to tokens that can be interpreted as elements and element attributes at the element identification and construction stage.
Consider rewording the above sentence to say: "... interpreted as elements, element attributes, tags and properties ..."
HTML and XML have a very similar way of tokenizing. However, the tokeniser must maintain information about token casing.
Element identification is a very important step in the cascading parser schema.
Cascading parser maintains tables of:
The correspondence between a tagname and a handler is not necessarily one-to-one. For example: "font", "video", "a". The context in which the tag is encountered is also responsible for deciding which handler to use.
When a HTML parser encounters an unrecognized element during the element identification step, it does the following:
Terminates the current element or attribute tag – For example it terminates P, LI, TR, TH, etc. elements.
This may not be entirely accurate. There are many quirks to how broken content is repaired. This is suggesting that svg elements are recognized and parsed anywhere inside HTML. It should be pointed out that XML namespaces apply too. However, this does not solve the element context problem. The DOM and ecma script contexts must be the same, otherwise I see no gain compared to inclusion-by-reference (i.e. <object data="some.svg">).
Makes a query of the Unknown Element tag to the Content Handlers for a Content Handler, which can handle the unrecognized element.
If a suitable Content Handler is not found, it obeys the HTML rules for undefined element(s).
If a Content Handler that can handle the element is found, the HTML parser passes the control of the file or token stream to the Content Handler. Based on the implementation. The HTML parser may also need to pass JS context and DOM branch reference to the identified Content Handler.
The newly identified Content Provider, starts the processing of (SVG) elements until it encounters unrecognized element error or a namespace, which it can not handle. The SVG Content Handler invokes the initial Content Handler in an attempt to identify the unrecognized element with another Content Handler.
What is "an unrecognized element"? This may not be clear enough. Consider for example <fie:foo xmlns:fie="http://www.foo.com/fooml"/>.
If there is a Content Handler that can handle the unrecognized element, the file or token stream is passed the new Content Handler.
If there is no Content Handler found, then the control of the stream is returned back to the initial Content Handler of the stream. The beginning of the element token that was not recognized is returned.
The HTML parser would need to continue processing the rest of the HTML elements until it encounters another unrecognized element.
If the embedded Content Handler returns with error and the content state is not recognized, the HTML parser will thread the stream from this point on as malformed element and should try to identify the first valid HTML element or a valid token and recover the normal parsing flow.
In the example 1, the HTML parser is going to find an element called SVG (or SVG:SVG) and going to query for a content provider which can handle SVG element. If there is such a provider, then the control of file or token stream is passed to the SVG content provider along with optional JS and DOM branch references.
How are malformed errors in SVG handled? As per the SVG spec OR using other rules passed to SVG content provider?
Why should parsing continue outside of the element where XML parsing started? When that element is closed control should be given back to the parent parser. Note that allowed characters in tag names and in content may differ between HTML and XML.
Is it assuming that elements up until the point of error were inserted? What about the element that had the error? What if it was a miss-nesting error? Should the currently open elements up to the point where XML parsing began on the stack be closed when this happens? This section needs to be covering more.
In cascaded parsing, when one Content Handler passes the JavaScript context to next level below, it is up to the new Content Handler to continue using the existing parent's JavaScript context or create a new one.
Additionally, the parent Content Handler may decide not to pass a valid JavaScript context to the embedded Content Handler. In this case the embedded Content Handler is forced to create a new JavaScript context, if needed.
If an unified DOM3 interface is exposed from the embedded Content Handler, it would be possible for the parents JavaScript context to traverse the child nodes of the embedded content. If this is not supported, the parent's JavaScript content should be allowed to at least see the top-level content node (SVG element for example and its attributes in the SVG content provider case). Please note that attributes exposed in the last case can vary from the standard SVG attributes. These attributes would be in relation to the parent (from SVG also referred as User Agent SVG control attributes).
The scripting context must be the same when using a "content handler" in the same document. The parser should always put the nodes in the same document. If you want separate contexts and separate documents then it's already possible with inclusion-by-reference (CDR in the CDF specs).
"Please note that attributes exposed in the last case can vary from the standard SVG attributes." - Not clear what this means. This section should be fully rewritten or removed.
Usually a separate parser than the host one is used for CSS rules.
"Usually a separate parser than the host one is used for CSS rules." - Not clear to me what that is supposed to mean.
A particular implementation may decide to pass the CSS processing rules to the child Content Handler. However, it is better to pass this information during re-flow and rendering phase of the content processing.
CSS must apply to all nodes in the same document as defined in the CSS and SVG specs. A may is not good enough. Rewrite suggestion for the last part of the section: "Style information that is applied to a particular node in the document is usually passed to that node during layout and rendering of the document." Not sure it's relevant to include it though.
Need to describe exactly how this is better - more efficient performance wise?
Note a CSS parser is a non-HTML parser inside HTML, so it's somewhat similar to introducing XML parser into HTML.
For an SVG parser to correctly process the content, all the content information, including namespace definition must be included in the document. This would be OK, when sealing with SVG content which is standalone as part of HTML frame or HTML object (this being the case of example 1).
If the SVG content is embedded in HTML elements such as lists or tables, then the content becomes very convoluted with the SVG namespace information repeated over and over again (see example 3).
Example 3
<form> <table border="1" align="center"> <caption><em>A test table with SVG elements</em></caption> <tr> <svg id="svgRoot" version="1.2" baseProfile="tiny" viewBox="-160 -120 480 640" width="480" height="640" stroke-miterlimit="2" zoomAndPan="enable" xml:space="preserve" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xe="http://www.w3.org/2001/xml-events"> <g fill-opacity="0.7" stroke="black" stroke-width="0.2"> <circle fill="red" cx="100" cy="100" r="100"/> </g> </svg> </tr> <tr> <svg id="svgRoot" version="1.2" baseProfile="tiny" viewBox="-160 -120 480 640" width="480" height="640" stroke-miterlimit="2" zoomAndPan="enable" xml:space="preserve" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xe="http://www.w3.org/2001/xml-events"> <g fill-opacity="0.7" stroke="black" stroke-width="0.2"> <circle fill="green" cx="100" cy="100" r="100"/> </g> </svg> </tr> <tr> <svg id="svgRoot" version="1.2" baseProfile="tiny" viewBox="-160 -120 480 640" width="480" height="640" stroke-miterlimit="2" zoomAndPan="enable" xml:space="preserve" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xe="http://www.w3.org/2001/xml-events"> <g fill-opacity="0.7" stroke="black" stroke-width="0.2"> <circle fill="blue" cx="100" cy="100" r="100"/> </g> </svg> </tr> </table> </form>
If the SVG content is embedded in HTML elements such as lists or tables, then the content becomes very convoluted with the SVG namespace information repeated over and over again (see example 3).
From example 3 it is obvious that the same document definition information is repeated for every SVG element in the HTML table. If the table had individual cells, the content would look even more complex and bulky.
The SVG widget concept is addressing the issue of repeated SVG document information by packing common document information at one place and leaving in the embedded content the information only related to the SVG elements or fragment (group of elements).
Example 4: SVG Widget Structure - svgDoc
<form> <svgDoc id="svgRoot" version="1.2" baseProfile="tiny" viewBox="-160 -120 480 640" width="480" height="640" stroke-miterlimit="2" zoomAndPan="enable" xml:space="preserve" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xe="http://www.w3.org/2001/xml-events"> <defs> <style type="text/css"> <![CDATA[ circle { fill:red; stroke: blue; stroke-width:3 } ]]> </style> </defs> </svgDoc> ... </form>
Example 5: Reference SVG Widget (svgDoc) using svgWidget
<table border="1" align="center"> <caption> <em>A test table with SVG elements</em></caption> <tr> <svgWidget docId="myDoc1"> <g fill-opacity="0.7" stroke="black" stroke-width="0.2"> <circle fill="red" cx="100" cy="100" r="100"/> </g> </svgWidget> </tr> <tr> <svgWidget docId="myDoc1"> <g fill-opacity="0.7" stroke="black" stroke-width="0.2"> <circle fill="gree" cx="100" cy="100" r="100"/> </g> </svgWidget> </tr> <tr> <svgWidget docId="myDoc1"> <g fill-opacity="0.7" stroke="black" stroke-width="0.2"> <circle fill="blue" cx="100" cy="100" r="100"/> </g> </svgWidget> </tr> </table>
CSS does not have to be nested inside SVG to have the rules apply to SVG nodes in the same document. This seems like a way of doing a hardcoded (or very limited) document transformation, which is already possible with XBL or XSLT. However, allowing (or requiring) XML namespaces in HTML would solve the same problem, if coupled with a statement that any <svg> element is recognized as an <svg> element in the svg namespace even it doesn't specify xmlns="http://www.w3.org/2000/svg". While it's nice to not have to do alot of extra typing it would break the ability to copy-paste svg source into existing svg editors.
SVG fragments have a common place to control document attributes, namespace definition, CSS
"SVG fragments have a common place to control document attributes, namespace definition, CSS" - CSS can be stored in many different places, but it is not clear why SVG would need anything special for that. If the document is HTML, then a style element in the head section is a good place. External stylesheets is another place.
SVG widgets may have common or separate JavaScript context
See previous comment about this.
Common JavaScript routines can be defined at the document level, accessible from all the widgets, belonging to that particular document
Widgets size and position can be controlled from HTML layout container (Tables, Lists, etc.)
This can already be controlled in existing browsers, you can set the width and height properties from CSS on the <svg> element.
Remove hard coded table of case fixes for svg/mathml elements and attributes (if still in the spec).
Merge the "U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z" and "U+0061 LATIN SMALL LETTER A through to U+007A LATIN SMALL LETTER Z" cases, using the definition for the latter, in:
Drop the "U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z" case in:
To:
When the steps below require the UA to insert an HTML element for a token, the token and all attribute tokens it contains are first normalized to lowercase [mapping A..Z to a..z]. If there are attribute tokens with the same name it is a parse error, discard all attribute tokens that are duplicates and the value that is associated with each such token (if any), keep the first occurrence of an attribute token whose name is duplicated. Then the UA must create an element for the normalized token in the HTML namespace, and then append this node to the current node, and push it onto the stack of open elements so that it is the new current node.
Cases in the "in foreign content" insertion mode need to be case-sensitive
Change the definition of "adjust foreign attributes"
Add the parse mode to "The in body insertion mode"
The behavior of this state depends on the content model flag.
Consume the next input character. If it is a U+002F SOLIDUS (/) character, switch to the close tag open state. Otherwise, emit a U+003C LESS-THAN SIGN character token and reconsume the current input character in the data state.
Consume the next input character:
If the content model flag is set to the RCDATA or CDATA states but no start tag token has ever been emitted by this instance of the tokeniser ( fragment case), or, if the content model flag is set to the RCDATA or CDATA states and the next few characters do not match the tag name of the last start tag token emitted (case insensitively), or if they do but they are not immediately followed by one of the following characters:
...then emit a U+003C LESS-THAN SIGN character token, a U+002F SOLIDUS character token, and switch to the data state to process the next input character.
Otherwise, if the content model flag is set to the PCDATA state, or if the next few characters do match that tag name, consume the next input character:
Consume the next input character:
Consume the next input character:
Consume the next input character:
Consume the next input character:
When the steps below require the UA to create an element for a token
in a particular namespace, the UA must create a node implementing
the interface appropriate for the element type corresponding to the tag
name of the token in the given namespace (as given in the specification
that defines that element, e.g. for an
a
element in the
HTML namespace, this specification defines it to be the
HTMLAnchorElement
interface), with the tag name being the
name of that element, with the node being in the given namespace, and
with the attributes on the node being those given in the given token.
The interface appropriate for an element in the
HTML namespace that is not defined in this specification is
HTMLElement
. The interface appropriate for an element in
another namespace that is not defined by that namespace's specification
is Element
.
When the steps below require the UA to insert an HTML element for a token, the token and all attribute tokens it contains are first normalized to lowercase [mapping A..Z to a..z].
If there are attribute tokens with the same name it is a parse error, discard all attribute tokens that are duplicates and the value that is associated with each such token (if any), keep the first occurrence of an attribute token whose name is duplicated. The UA must then create an element for the normalised token in the HTML namespace. The newly created node must be appended to the current node and push it onto the stack of open elements so that it is the new current node.
The steps below may also require that the UA insert an HTML element in a particular place, in which case the UA must follow the same steps except that it must insert or append the new node in the location specified instead of appending it to the current node. (This happens in particular during the parsing of tables with invalid content.)
When the steps below require the UA to insert a
foreign element for a token, the UA must first
create an element for the token in the given namespace, and then
append this node to the
current node, and push it onto the
stack of open elements so that it is the new
current node. If the newly created element has an
xmlns
attribute in the XMLNS namespace
whose value is not exactly the same as the element's namespace, that
is a
parse error.
When the steps below require the user agent to adjust foreign attributes for a token, then for each attribute on the token the following stemps must be applied:
xml:
and has one
or more characters following, let the attribute be a namespaced
attribute with the prefix xml
, local name being
the characters after the colon, and the namespace being the XML
namespace, and abort these steps.
xmlns:
and has one
or more characters following, let the attribute be a namespaced
attribute with the prefix xmlns
, local name
being the characters after the colon, and the namespace being the
XMLNS namespace, and abort these steps.
xmlns
, let the attribute a
namespaced attribute with no prefix, the local name
xmlns
, and the namespace being the XMLNS namespace.
When the insertion mode is " in body", tokens must be handled as follows:
...Save the tokeniser content model flag to old-state.
Switch the tokeniser's content model flag to the CDATA state.
The CDATA state isn't good enough since it transforms characters (and this mode can probably be exited too easily, which is not desirable either). A new CDATA-like state that passes data through unmodified may be required.
Create a new XML parser. Set the encoding to the character encoding used by the HTML parser.
Feed the XML parser the string corresponding to the start tag of the element along with all its attributes.
Let the XML parser parse and insert the foreign element.
Added a link to foreign element insertion. Is this ok or should it be removed?
Then continue to feed character tokens to the XML parser until it:
This is loosely based on the cascading parser ideas, but needs more work.
If the XML parser returns a fatal error:
If the XML parser returns with success, then destroy the XML parser. Then reset the tokeniser content model flag to the old-state and reset the insertion mode appropriately.
If the XML parser encounters a tagname which is unknown:
Note that broken tags will get closed after an XML error, so that something will be visible (but will look broken). This still "suffers from reparsing issues" and "is too draconian".
When the insertion mode is " in foreign content", tokens must be handled as follows:
Insert the token's character into the current node.
Append a Comment
node to the
current node with the data
attribute set
to the data given in the comment token.
Parse error. Ignore the token.
mi
element in the
MathML namespace.
mo
element in the
MathML namespace.
mn
element in the
MathML namespace.
ms
element in the
MathML namespace.
mtext
element in the
MathML namespace.
annotation-xml
element
in the
MathML namespace.
Process the token using the rules for the secondary insertion mode.
The secondary insertion mode must be case-sensitve for these cases.
If, after doing so, the insertion mode is still " in foreign content", but there is no element in scope that has a namespace other than the HTML namespace, switch the insertion mode to the secondary insertion mode.