Re: SVG Feedback on HTML5 SVG Proposal from Henri Sivonen on 2009-03-11 (public-html@w3.org from March 2009)

From: Henri Sivonen <hsivonen@iki.fi>
Date: Wed, 11 Mar 2009 10:20:34 +0200
To: Ian Hickson <ian@hixie.ch>
Cc: Doug Schepers <schepers@w3.org>, public-html@w3.org, www-svg <www-svg@w3.org>
Message-Id: <D7568D3A-B3BE-473C-AE5E-0BF5076CF3E4@iki.fi>
On Mar 11, 2009, at 01:48, Ian Hickson wrote:

> On Tue, 10 Mar 2009, Doug Schepers wrote:
>> * For the case where an SVG file is inadvertently served as 'text/ 
>> html',
>> the SVG WG proposes that if the parser encounters an 'svg' element in
>> the "before html" parse mode that no 'html' and 'body' element be
>> inserted above the 'svg' element.
[...]
> Why would we want to support SVG files sent as text/html? Surely  
> this is
> an error and should not be supported.

See http://lists.w3.org/Archives/Public/www-archive/2009Mar/0036.html  
for a use case. (Note that the use case makes sense if <svg> as root  
is conforming. I think it isn't worthwhile to add all the complexity  
of <svg> root handling as mere error handling.)

> * I am concerned also that actually implementing this would consist  
> of a
>   significant change to the parsing algorithm, reaching across  
> multiple
>   insertion modes, affecting very sensitive things like quirks mode
>   detection.

Indeed, I think we should stick to <!DOCTYPE html> to avoid  
complicating the mode detection and to avoid making it more stateful  
across tokens. However, I can see how <!DOCTYPE html><svg>... might be  
seen as aesthetically unpleasing (even though such a construct would  
even be allowed in well-formed XML!).

> I am also concerned that this would lead to very strange behavior for
> authors once they started relying on it. Consider for instance the
> difference between this:
>
>   [BOM]<svg>...
>
> ...and:
>
>   [BOM][BOM]<svg>...

I think issues of this nature are the strongest reason against.

>> * Ideally, the SVG WG would like the HTML tokenizer to be
>> case-preserving for attribute and element names.
>
> My understanding is that doing this would introduce an unacceptable
> performance penalty for implementations.

Indeed.

The case information is lost in the tokenizer early on:
http://hg.mozilla.org/users/mrbkap_mozilla.com/html5parsing/file/ed748ec71a6d/content/html/parser/src/nsHtml5Tokenizer.cpp#l655

Then the token interning function can work without caring about case:
http://hg.mozilla.org/users/mrbkap_mozilla.com/html5parsing/file/ed748ec71a6d/content/html/parser/src/nsHtml5ElementName.cpp#l58

The camelCase SVG tokens are shared between all parser instances:
http://hg.mozilla.org/users/mrbkap_mozilla.com/html5parsing/file/ed748ec71a6d/content/html/parser/src/nsHtml5ElementName.cpp#l507

Then much later SVG camelCase names are fixed based on the pre- 
interned well-known tokens:
http://hg.mozilla.org/users/mrbkap_mozilla.com/html5parsing/file/ed748ec71a6d/content/html/parser/src/nsHtml5TreeBuilder.cpp#l3416

By the time the tree builder knows that the token creates an SVG  
element, the case information is long lost, and the object the tree  
builder works with is a pre-interned read-only object shared by all  
parser instances in the process, and such shared objects can't have  
any per-parser-instance data such as the original case on them.

I wouldn't want to undo this token interning mechanism just to be able  
to send errors to the error console in Firefox. Also, I wouldn't want  
to maintain significantly different code paths for different classes  
of products (error-detecting and not error-detecting).

>> * The SVG WG requests that minimized and unquoted attribute values  
>> raise parse
>> errors when found on SVG elements. Rationale:
>> 1. Consistent with making incorrect xmlns attributes generate parse  
>> error.
>> 2. Minimizing the number of documents which are conforming HTML  
>> whose SVG
>> fragments when copied to "image/svg+xml" are non-wellformed.
>
> This seems reasonable; what do other people think about this? (There  
> have
> been requests that we make SVG-in-HTML support HTML-like attribute  
> syntax.)

I think it makes sense to make it a conformance error if an SVG  
element has an attribute xmlns whose value is not the SVG namespace  
URI. However, I see no point in making the absence of the xmlns  
attribute an error, when thing can be made work just fine without it.

I don't like the idea of making attributes have different errors for  
SVG elements:
  1) It would make text/html self-inconsistent where self- 
inconsistency is easily avoidable.
  2) It would complicate an error-reporting tokenizer. Furthermore,  
the doing the check only for tokens that will later result in SVG  
elements would be complicated. The most reasonable implementation  
would only query the 'in foreign' state, which would mean that the  
errors would apply to MathML as well and to HTML elements that end up  
breaking out of foreign content.

I think the issue of moving content from text/html to image/svg+xml by  
copying and pasting raw source is a lost cause anyway. I think a  
browser context menu item "Save as SVG Image..." or "View SVG source"  
would work much better. For a solution that doesn't require a browser,  
I put forward the HTML2XML command line tool that comes with the  
Validator.nu HTML Parser.

Concretely, I see no practical reason why this demo should be non- 
conforming:
http://hsivonen.iki.fi/test/moz/html5-parsing.html

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/
Received on Wednesday, 11 March 2009 08:21:35 UTC