SVG Feedback on HTML5 SVG Proposal

Hi, HTML WG-

These are some of the opinions of the SVG WG on the topic of 
SVG-in-text/html, for consideration by the HTML WG.  The opinions of 
individuals within the SVG WG differs; some favor a pure-XML approach, 
and some are more predisposed to a looser syntax, but in general, this 
is the state of our group consensus.  We are happy to discuss specific 
details.

The requirements are based on consensus reached at the SVG WG Sydney F2F 
2009, and in part from TPAC 2008. For reference and contrast, please see 
the old SVG in HTML proposal from the SVG WG. [1]


Requirements

1. HTML5 and SVG should make every effort to minimize the learning 
curve, pitfalls and other undesirable issues that content authors may 
encounter due to differences between SVG served as image/svg+xml and SVG 
in text/html, and when it comes to moving SVG between these two types of 
document. In so far as is possible, content authors should be able to 
take a valid SVG document, paste its markup into an HTML document, and 
have it render as expected and have the SVG fragment's DOM be identical 
to the DOM of the standalone SVG document when served as image/svg+xml. 
Content authors should not be burdened with unnecessary debugging, 
tweaking or cleanup steps in the common case when it comes to this 
simple process.

2. HTML5 should not place unnecessary barriers in the way of, or 
unnecessarily restrict, the future evolution of the SVG language. (Both 
working groups should coordinate to maximize compatibility between the 
two specifications and avoid standing on each others toes, of course.)

In line with making the Open Web Platform as easy and pain free to use 
as possible, the WG believes that, in general, when HTML5 parsers 
encounter SVG that would not be valid XML SVG, the SVG should be 
non-conforming, even though it would render. The rational is that 
validators and error consoles would flag and raise awareness of any 
issues in someones HTML SVG that would stop them from copying the markup 
out to an XML file, and thereby be another weapon in reducing author 
pain when working with open formats like SVG and HTML.


Feedback

The following is feedback on the "foreign content" text that is 
currently commented out of the HTML5 draft by <!--XXXSVG ... --> 
comments. (These comments can be seen by loading pages from the parsing 
section of the HTML5 draft [2], running this Show XXXSVG comments 
bookmarklet [3], and then searching the pages for "XXXSVG".)

* The SVG WG is of the opinion that the contents of the SVG 'title' 
element should be RCDATA, and therefore would prefer that the HTML5 
parsing algorithm not require conforming parsers to break out of foreign 
content mode and parse the element's content as HTML.

* The SVG WG feels that, on balance, it would be useful for the contents 
of SVG's 'desc' and 'foreignObject' elements to be parsed as HTML by 
default, and therefore do not object to the HTML5 draft requiring 
conforming parsers to break out of foreign content mode to parse the 
content of these elements. However, the SVG WG does have some concerns 
regarding adverse effects on extensibility.  We also do not support the 
use of 'desc' as a container for fallback content, as has been 
suggested, though we do agree that a fallback mechanism for both SVG and 
HTML is a useful idea.

* The SVG WG recognizes that entities pose a particular challenge: 
undefined entity/character references won't work if SVG fragments are 
copied out of HTML, and DOCTYPE-defined entities (as is common for some 
SVG authoring tools) could only work if those entities definitions are 
included in the file and are somehow recognized. The same problem could 
also occur in XHTML+SVG documents. In general, the SVG WG agrees that 
special-casing some entity handling is acceptable, and is happy to have 
a further dialog with implementers about this.

* For the 'font' element: If the HTML WG believes that it's worth the 
extra complexity of implementation with the special handling of the 
<font> element in order to have a minor fraction of existing html 
content not change its rendering, then ok. (The SVG WG thinks it's good 
that the <font> element won't break out of foreign content mode for SVG 
for the most part.)

* There's a comment in the HTML5 spec [[<!--XXXSVG need to define 
processing for </script> to match HTML5's </script> processing -->]] 
Could the HTML WG please clarify what is required with regards to that?

* In XML CDATA-sections are distinct from text, but in HTML it's all the 
same. It means scripts that look at the structure of documents may not 
work. However, this is a minor issue that the SVG WG is willing to live 
with.

* The SVG WG is happy to see that XML and DOCTYPE declarations are 
ignored if found under the root element of the document. In that case 
they should have no effect (though it may be useful to discuss this in 
terms of the effect on entities declared in the DOCTYPE).

* The HTML5 draft defines a set of tags names for which the parser 
should break out of foreign content mode. The SVG WG would like to know 
the rationale for doing so for each of these tags.

* The SVG WG suggests that unless proven to be breaking lots of content, 
adding character encoding-detection for SVG files served as "text/html" 
based on <?xml encoding="..."?>. There would still be an issue with 
UTF-8 SVG documents lacking an XML declaration; perhaps the fact that 
the first open tag encountered in the document is an <svg> tag could 
make the encoding guesser choose UTF-8 in this case?

* Ideally, the SVG WG would like the HTML tokenizer to be 
case-preserving for attribute and element names.

* The SVG WG requests that the SVG case-fixup table be removed from the 
draft. We believe that HTML5 should defer to the appropriate (SVG) 
specification(s), and that this is not something that HTML5 should 
define. If the tokenizer is required to be case-preserving, the table is 
no longer necessary.

* Going forward, the SVG WG recognizes that choosing all lowercase 
attribute names would be helpful for both integration in HTML and if 
certain attributes are to become CSS properties. Choosing all lowercase 
element names would also be preferred, although in some cases 
consistency would dictate that we would introduce some new mixed case 
element names. For example, if we introduced a new filter primitive 
element that didn't adhere to the "feSomethingOrOther" style, it would 
be confusing for authors.

* For the case where an SVG file is inadvertently served as 'text/html', 
the SVG WG proposes that if the parser encounters an 'svg' element in 
the "before html" parse mode that no 'html' and 'body' element be 
inserted above the 'svg' element. Rather, we would prefer that the 
parser be required to simply insert the 'svg' element and switch to 
foreign content mode. (HTML5 could specify that documents with 'svg' as 
the root element are non-conforming so validators would flag this case.) 
There are at least two reasons for making this change. First, if 
parented by an implicit 'body' element, most SVG (specifically SVG that 
depends on the default value of 100% for the 'height' attribute on the 
'svg' element) would then get a used height of the 150px (the CSS 2.1 
replaced element fallback height). This would result in SVG mistakenly 
or deliberately served as text/html rendering differently to the same 
SVG viewed locally or served as image/svg+xml. Secondly, accessing the 
'document.documentElement' object is common in JavaScript in SVG, and 
SVG assumes that this will be the 'svg' element and will not be prepared 
to encounter inserted parent 'html' and 'body' elements. This script 
would need to be change if pasted in the middle of an HTML document, but 
we would be able to prevent breakage if the SVG were pasted as the whole 
document. Such documents should be in standards mode, regardless of 
whether they include the SVG DOCTYPE. We do have one unresolved issue 
with our request, however. If the parser encounters an HTML start tags 
that break out of foreign content mode, where would it "break out" to 
(There's no <body> element to pop back to)?

* When SVG fragments in HTML are encountered, any invalid element or 
attribute casing should be generating parse errors.

* The SVG WG is happy to see that unknown elements that are inside SVG 
fragments are inserted as SVG elements, but we'd like to see the casing 
of attributes and element names preserved.

* The SVG WG agrees that foreign content should not be allowed to imply 
start or end tags.

* The SVG WG requests that minimized and unquoted attribute values raise 
parse errors when found on SVG elements. Rationale:
  1. Consistent with making incorrect xmlns attributes generate parse error.
  2. Minimizing the number of documents which are conforming HTML whose 
SVG fragments when copied to "image/svg+xml" are non-wellformed.

* The SVG WG agrees that it may be useful to forego namespace 
declarations for the SVG and XLink namespaces (as well as certain 
others, such as MathML).  However, we believe that rather than 
hardcoding the namespace prefixes, those prefixes should default to that 
namespace.  We are not suggesting at this time that namespace 
declarations should be able to override that default in HTML5, but some 
future revision of the language may specify that behavior, and 
hardcoding limits the potential for future extensibility solutions.

There are other issues on which we do not yet have consensus, and other 
considerations we believe are germane.  These issues are always 
available for public feedback on the SVG WG wiki. [4]

We are eager to discuss our feedback, and hope for a timely resolution.

[1] http://dev.w3.org/SVG/proposals/svg-html/svg-html-proposal.html
[2] http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html
[3] http://bookmarklet.link/
[4] http://www.w3.org/Graphics/SVG/WG/wiki/SVG_in_text-html_2009

Regards-
-Doug Schepers, on behalf of the SVG WG

Received on Tuesday, 10 March 2009 22:57:50 UTC