Re: SVG in text/html (was: @role in SVG) from Simon Pieters on 2007-10-13 (public-html@w3.org from October 2007)

From: Simon Pieters <simonp@opera.com>
Date: Sat, 13 Oct 2007 18:22:55 +0200
To: "Henri Sivonen" <hsivonen@iki.fi>, "Doug Schepers" <schepers@w3.org>
Cc: www-svg <www-svg@w3.org>, public-cdf@w3.org, "public-html@w3.org" <public-html@w3.org>
Message-ID: <op.tz484hu6idj3kv@hp-a0a83fcd39d2.belkin>
On Sat, 13 Oct 2007 16:43:59 +0200, Henri Sivonen <hsivonen@iki.fi> wrote:

> Do you mean you'd like to bring in the complication of arbitrary  
> namespace prefixes? I'd like make the following deviations from SVG- 
> as-XML syntax:

I think it should be possible to have the same SVG markup to work both  
when parsed as XML and when parsed with the HTML parser (to the same  
extent as you can have HTML markup that works both when parsed as XML and  
when parsed with the HTML parser), and moreover, it should be possible to  
write scripts that fix up the DOM afterwards for legacy UAs.


>   1) I'd like to minimize the need of tokenizer parametrization to  
> toggling case folding behavior and, if we must, CDATA sections.

CDATA sections and the content model flag are interesting.

In legacy UAs, <script> and <style> will be parsed as CDATA elements, and  
<title> and <textArea> as RCDATA. Doing the same in new UAs is nice  
because that makes sure that content will degrade reasonably in legacy  
UAs, and makes it easier to write scripts that fixes the DOM for legacy  
UAs. For <script>, <style> and <title> this would not be a problem, since  
they normally only contain text, but <textArea> is more problematic since  
it can contain elements. (<textArea> is new in SVG 1.2 and apparently  
there isn't much content using it yet. Renaming that element would make  
this issue go away.)

Also, authors are already used to doing "<script>//<![CDATA[" when working  
with markup that needs to work as both HTML and XHTML, so having the same  
rules for SVG in HTML is likely what authors would expect.

Having all SVG elements be PCDATA (as in XML) would probably mean that we  
also have to introduce CDATA sections (since authors don't want to write  
"&amp;&amp;" in their scripts, and it would be harder to make things work  
in legacy UAs).


> Specifically, I think attribute tokenization should run the same code as  
> attribute tokenization for the HTML parts of text/html.
>   2) I'd like to avoid supporting arbitrary namespace prefixes both in  
> order to sidestep issues in shipped IE versions and in order to relieve  
> authors of namespace syntax. (xlink: should probably be considered  
> non-arbitrary and hard-wired.)
>
> More concretely, I've been thinking something like this might work:
>   * Case folding in the tokenizer should be made conditional so that  
> potentially camelCap names in <svg> subtrees would not be case-folded.
>     - Issue: Should case folding be toggled on and off (in which case  
> tokenizing "<svg " would happen in the case-folding state allowing "<SvG  
> ") or should names be collected unfolded and then whole names  
> conditionally case-folded (in which case we could require "<svg " to be  
> in lower case)?
>     - Issue 2: If the latter, to avoid expensively case-folding whole  
> start tag tokens *including* attributes later on, the tokenizer should  
> probably have to know about tag names that turn on the case-preserving  
> mode before looking for attributes but the tree builder should be the  
> part of the parser telling the tokenizer to switch back to the case  
> folding mode. This would be ugly but probably necessary.

I don't think it's necessary to require the svg start tag to be lowercase  
if doing so would be a performance problem, but I don't feel strongly  
about it. It is however necessary to get the case of the attributes of the  
svg start tag right because of (at least) the viewBox="" attribute.


>   * Start tag tokens should have a flag about the /> presence. The tree  
> builder would ignore this for HTML elements but would pop immediately  
> for SVG elements.

Doing so for "script", "style", "title" and "textArea" would mess up  
legacy UAs badly.


>   * The <svg> element would establish "an SVG scope" in the tree  
> builder. The <svg> start tag token would itself be handled in the HTML  
> state of the tree builder so that the <svg> element would be subject to  
> foster parenting.
>   * When in an SVG scope, the tree builder would ignore the HTML tree  
> building rules. This means that stray tags looking like HTML tags could  
> not cause the tree builder to pop out of the SVG scope. While in the SVG  
> scope, the tree builder would assign the SVG namespace URI to the  
> element nodes it creates.
>     - Issue: What to do if there is a prefixed element?

Do the same as what you do with a prefixed element outside SVG scope  
(i.e., include the prefix and the colon in the local name).


>   * When in the SVG scope, a start tag token would unconditionally  
> result in the corresponding element node to be appended to the current  
> node. (And if the /> flag is set on the token, the node would be popped  
> immediately.)
>   * When in the SVG scope, an end tag token would cause a corresponding  
> element to be searched starting with the current node towards the start  
> of the SVG scope (and no further). If an element were found in scope,  
> the stack would be popped until that element got popped. If there were  
> no such element in scope, the end tag would be ignored. Any outcome but  
> a single pop would be a parse error.
>   * When the current node is a foreignObject element in an SVG scope,  
> the start tag token <html> would establish a "nested HTML scope". </ 
> html>, <body> and </body> would act like "normal" tokens in a nested  
> HTML scope. Specifically, any token other than </html> encountered in a  
> nested HTML scope would be unable to break out of the nested HTML scope.

I think it makes more sense to make <foreignObject> itself switch back to  
normal "in body". The common case seems to be to just have a <div> as  
child when you use XHTML in a <foreignObject>.


>   * Attributes with the name "xlink:href" on the tokenization level  
> would be reported by the tokenizer as local name "href" in the XLink  
> namespace.
>   * xmlns or xmlns:* attributes would have no meaning and would be  
> non-conforming except xmlns="http://www.w3.org/2000/svg" and  
> xmlns:xlink="http://www.w3.org/1999/xlink" would be allowed as  
> "talismans" on the <svg> start tag.

Allowing the xmlns="http://www.w3.org/1999/xhtml" talisman on the child of  
foreignObject, too (perhaps only for <div>?).


> [...]

-- 
Simon Pieters
Opera Software
Received on Saturday, 13 October 2007 16:24:01 UTC