Re: SVG Feedback on HTML5 SVG Proposal from Ian Hickson on 2009-03-10 (public-html@w3.org from March 2009)

From: Ian Hickson <ian@hixie.ch>
Date: Tue, 10 Mar 2009 23:48:59 +0000 (UTC)
To: Doug Schepers <schepers@w3.org>
Cc: public-html@w3.org, www-svg <www-svg@w3.org>
Message-ID: <Pine.LNX.4.62.0903102315340.2690@hixie.dreamhostps.com>
On Tue, 10 Mar 2009, Doug Schepers wrote:
> 
> * The SVG WG is of the opinion that the contents of the SVG 'title' 
> element should be RCDATA, and therefore would prefer that the HTML5 
> parsing algorithm not require conforming parsers to break out of foreign 
> content mode and parse the element's content as HTML.

My thinking when I made <title> switch to the HTML mode was that this was 
necessary for supporting <ruby>, which I am told is necessary for a good 
internationalisation story. Also, it's unclear which SVG elements one 
should use within <title> to annotate languages, which I am told is 
necessary for both internationalisation and accessibility (in HTML, the 
<span lang=""> element would seem the obvious choice).

I don't have a strong opinion on this issue; can the SVG WG confirm that 
<ruby> support within <title> is not desired and that there is some 
SVG-specific way of doing language annotation, or that language annotation 
is not needed for <title>? If so, adopting this proposal seems like a good 
idea.


> * There's a comment in the HTML5 spec [[<!--XXXSVG need to define 
> processing for </script> to match HTML5's </script> processing -->]] 
> Could the HTML WG please clarify what is required with regards to that?

This would basically consist of defining how features like 
document.write() interact with SVG's <script> element. Please see the text 
for handling the "script" start and end tags for the text that is used for 
the HTML <script> element today.


> * The HTML5 draft defines a set of tags names for which the parser 
> should break out of foreign content mode. The SVG WG would like to know 
> the rationale for doing so for each of these tags.

The tags listed were found to be tags that currently exist in Web content 
that has <svg> start tags. (It turns out there's quite a few pages out 
there that have <svg> tags in them, for no apparent reason. The goal with 
this list is to avoid breaking these pages too much.)


> * The SVG WG requests that the SVG case-fixup table be removed from the 
> draft. We believe that HTML5 should defer to the appropriate (SVG) 
> specification(s), and that this is not something that HTML5 should 
> define.

It seems dangerous to split the definition of how to parse text/html into 
multiple specifications. However, updating the list should be easy and 
quick, and in practice shouldn't affect implementations (who would just 
update their lists regardless of what the specs say). What is the concern?


> * For the case where an SVG file is inadvertently served as 'text/html', 
> the SVG WG proposes that if the parser encounters an 'svg' element in 
> the "before html" parse mode that no 'html' and 'body' element be 
> inserted above the 'svg' element. Rather, we would prefer that the 
> parser be required to simply insert the 'svg' element and switch to 
> foreign content mode. (HTML5 could specify that documents with 'svg' as 
> the root element are non-conforming so validators would flag this case.) 
> There are at least two reasons for making this change. First, if 
> parented by an implicit 'body' element, most SVG (specifically SVG that 
> depends on the default value of 100% for the 'height' attribute on the 
> 'svg' element) would then get a used height of the 150px (the CSS 2.1 
> replaced element fallback height). This would result in SVG mistakenly 
> or deliberately served as text/html rendering differently to the same 
> SVG viewed locally or served as image/svg+xml. Secondly, accessing the 
> 'document.documentElement' object is common in JavaScript in SVG, and 
> SVG assumes that this will be the 'svg' element and will not be prepared 
> to encounter inserted parent 'html' and 'body' elements. This script 
> would need to be change if pasted in the middle of an HTML document, but 
> we would be able to prevent breakage if the SVG were pasted as the whole 
> document. Such documents should be in standards mode, regardless of 
> whether they include the SVG DOCTYPE. We do have one unresolved issue 
> with our request, however. If the parser encounters an HTML start tags 
> that break out of foreign content mode, where would it "break out" to 
> (There's no <body> element to pop back to)?

Why would we want to support SVG files sent as text/html? Surely this is 
an error and should not be supported.

There are a number of practical reasons for which I think this change 
would be unwise:

 * I am concerned also that actually implementing this would consist of a 
   significant change to the parsing algorithm, reaching across multiple 
   insertion modes, affecting very sensitive things like quirks mode 
   detection.

 * It would introduce inordinate complexity in the case of an <svg> tag at 
   the start of an HTML file, since the DOM would have to be fixed up to 
   have an <html> root node, which would involve serious surgery to the 
   resulting DOM.

 * The HTML parser is written with the assumption of an <html> root node 
   throughout, and changing that assumption would be a lot of work both 
   for the spec and all the existing implementations.

I am also concerned that this would lead to very strange behavior for 
authors once they started relying on it. Consider for instance the 
difference between this:

   [BOM]<svg>...

...and:

   [BOM][BOM]<svg>...

...where "[BOM]" represents a Unicode byte-order mark. These two files, 
indistinguishable in most text editors, would have different root 
elements. The de-facto HTML parser today goes to some lengths to avoid 
problems with things like this (e.g. propagating attributes from stray 
<html> and <body> tags).


> * Ideally, the SVG WG would like the HTML tokenizer to be 
> case-preserving for attribute and element names.

My understanding is that doing this would introduce an unacceptable 
performance penalty for implementations.


> * When SVG fragments in HTML are encountered, any invalid element or 
> attribute casing should be generating parse errors.

This would require a case-preserving tokeniser.


> * The SVG WG is happy to see that unknown elements that are inside SVG 
> fragments are inserted as SVG elements, but we'd like to see the casing 
> of attributes and element names preserved.

I agree that this would be ideal, but my understanding is that doing this 
would introduce an unacceptable performance penalty.


> * The SVG WG requests that minimized and unquoted attribute values raise parse
> errors when found on SVG elements. Rationale:
>  1. Consistent with making incorrect xmlns attributes generate parse error.
>  2. Minimizing the number of documents which are conforming HTML whose SVG
> fragments when copied to "image/svg+xml" are non-wellformed.

This seems reasonable; what do other people think about this? (There have 
been requests that we make SVG-in-HTML support HTML-like attribute syntax.)


> * The SVG WG suggests that unless proven to be breaking lots of content, 
> adding character encoding-detection for SVG files served as "text/html" 
> based on <?xml encoding="..."?>. There would still be an issue with 
> UTF-8 SVG documents lacking an XML declaration; perhaps the fact that 
> the first open tag encountered in the document is an <svg> tag could 
> make the encoding guesser choose UTF-8 in this case?

I don't understand this proposal. Could you elaborate?


> * The SVG WG agrees that it may be useful to forego namespace 
> declarations for the SVG and XLink namespaces (as well as certain 
> others, such as MathML). However, we believe that rather than hardcoding 
> the namespace prefixes, those prefixes should default to that namespace.  
> We are not suggesting at this time that namespace declarations should be 
> able to override that default in HTML5, but some future revision of the 
> language may specify that behavior, and hardcoding limits the potential 
> for future extensibility solutions.

I don't understand this proposal. Could you elaborate?

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Tuesday, 10 March 2009 23:49:41 UTC