Re: SVG in text/html from Henri Sivonen on 2007-10-16 (www-svg@w3.org from October 2007)

From: Henri Sivonen <hsivonen@iki.fi>
Date: Tue, 16 Oct 2007 13:37:15 +0300
To: Doug Schepers <schepers@w3.org>
Cc: www-svg <www-svg@w3.org>, public-cdf@w3.org, "public-html@w3.org" <public-html@w3.org>
Message-Id: <0BBB98FE-1E14-485C-AFFB-62DEBE0612E4@iki.fi>
On Oct 13, 2007, at 23:20, Doug Schepers wrote:

> Henri Sivonen wrote (on 10/13/2007 10:43 AM):
>> Do you mean you'd like to bring in the complication of arbitrary  
>> namespace prefixes?
>
> Not necessarily.  I'm fine with imposing certain limitations on SVG  
> content, assuming that it's a set of limitations that can be easily  
> obeyed by authoring tools (and which, preferably, existing  
> authoring tools abide by anyway).

It seems to me that using colonless element names is an easy  
limitation for authoring tools to follow.

> The most important thing for me is that SVG fragments from an HTML 
> +SVG (SVG-in-HTML) compound document could be extracted as  
> standalone SVG documents; the second most important thing is that  
> the most likely content from standalone SVG documents should work  
> as an SVG fragment in HTML (this is second because I think it is  
> likely that this will be the case, given existing SVG content- 
> creation tools).

Do you mean the extraction from HTML should work on the source copy- 
paste level as opposed to using a tool that incorporates an HTML  
parser and an XML serializer? Even if the conforming case were  
carefully specced to allow such copy-paste, content out there would  
inevitably start to contain constructs that wouldn't be safe for  
pasting into XML (like content that tries to be XHTML 1.0-as-text/ 
html is now unsafe for pasting into XML on the source level), so  
doing the extraction using a parser followed by a serializer would be  
the safe way to go.

>> I'd like make the following deviations from
>> SVG-as-XML syntax:
>>  1) I'd like to minimize the need of tokenizer parametrization to  
>> toggling case folding behavior and, if we must, CDATA sections.
>
> Strictly speaking, CDATA sections are not required in SVG, but as  
> you know, script will break in an XML parser it if doesn't escape  
> its "<" and "&" characters.  The majority of SVG authoring tools, I  
> suspect, are not script-aware: they are just drawing apps that  
> export to SVG; people savvy enough to be scripting can be expected  
> to take precautions and read FAQs to resolve their problems there.
>
> Even drawing tools, though, are likely to use CSS, and may  
> automatically enclose it in a CDATA section "just to be safe".  It  
> would be worthwhile to look at the survey of tools and see if they  
> do this, and if so, if they can be encouraged to change this practice.
>
> I would prefer that CDATA be allowed, but it's not a deal-breaker.   
> I confess I don't know why it's a problem in the HTML parser,  
> though, if you care to explain.

Introducing CDATA sections wholesale into text/html (also into the  
HTML parts of the document) would be a problem because new CDATA- 
aware parsers and old CDATA-unaware parsers would give incompatible  
parse trees and the incompatibility wouldn't even add any  
expressiveness to the language.

As for introducing CDATA sections but only for <svg> subtrees only,  
there's the issue of whether to be consistent with the surrounding  
HTML syntax or with XML syntax. Copy-pasteability suggests supporting  
XMLisms like CDATA sections and /> is <svg> subtrees. Consistency  
with the surrounding HTML would suggest not supporting CDATA sections.

The general problem with SVG <title>, <script>, <style> and  
<textArea> is ensuring that they don't produce ungraceful results  
when an SVG-in-text/html document is loaded in a legacy text/html  
browser. It seems to me that authors who want to avoid <textArea>  
rendering as HTML <textarea> in legacy browsers just have to avoid  
<textArea> in SVG-in-text/html. <title> seems harmless enough when  
the surrounding HTML already has a <title> of its own.

In the case of <style> and <script>, legacy browsers would try to  
treat them as HTML <style> and <script>. Parsing them the same way as  
HTML <style> and <script> in the case of SVG-in-text/html would at  
least ensure that both old and new parsers agree on when the elements  
end even when the script/style content touches edge cases. On the  
other hand, having CDATA sections and not having element-specific  
tokenization content models would be good for copying and pasting  
from XML files.

I can't say off-hand which approach is the best.

> Most tools do include XML prologs and DOCTYPES in their SVG  
> output... what affect will this have on a whole-file copy-paste  
> into HTML, in terms of parsing?

You can't paste an XML declaration or a DOCTYPE in the middle of an  
XHTML+SVG document, so from the conformance point of view I don't  
think it is necessary to allow them to be pasted in the middle of  
text/html. As for what should happen if you paste them in  
nonetheless, I think the current behavior of the HTML5 parsing  
algorithm is reasonable: the XML declaration turns into a comment  
node and the doctype gets dropped.

>> Specifically, I think attribute tokenization should run the same  
>> code as attribute tokenization for the HTML parts of text/html.
>
> Could you elaborate on that?  What are the implications?

Unquoted attributes would be treated as in text/html in general. XML  
attribute value normalization wouldn't be performed. (That is,  
authors should rely on the parser discarding white space around the  
value. Authors simply shouldn't put extra spaces in there. This is  
already good advice with XML when the author doesn't know the  
configuration of the receiving XML parser.) White space between the  
close quote of a previous attribute and the name of the next  
attribute wouldn't be required.

>>  2) I'd like to avoid supporting arbitrary namespace prefixes both  
>> in order to sidestep issues in shipped IE versions and in order to  
>> relieve authors of namespace syntax. (xlink: should probably be  
>> considered non-arbitrary and hard-wired.)
>
> I think it's reasonable both to limit arbitrary namespace prefixes  
> in HTML+SVG, and to hard-wire the XLink namespace.  That SVG- 
> fragment content will still work as expected in a standalone SVG  
> UA, and most people trying to do clever things in namespaces will  
> probably be using XHTML+SVG anyway.

OK.

>> The above trial balloon proposal is designed to optimize SVG  
>> integration in text/html in *future* browsers in a way that would  
>> create a namespace-aware DOM that current DOM-based SVG  
>> implementations would grok immediately but would at the same time  
>> remove namespace declaration syntax from the sight of authors. The  
>> proposal specifically isn't designed to clone the colon-based  
>> namespaces-in-text/html mechanism of IE. OTOH, it shouldn't  
>> interfere with it, either, except perhaps for xlink:href, which  
>> could be worked around by introducing href.
>
> I'm still on the fence about 'null:href'.  Can you explain in  
> detail why this is so problematic in HTML5 (especially given that  
> SVG isn't natively supported in IE anyway)?

Perhaps special-casing xlink:href *only* isn't that bad, but  
specifying new processing for names with colons *in general* carries  
the risk of specifying something that's incompatible with what  
happens when the syntax is fed to current IE.

I've got an impression that Microsoft doesn't want to change what  
they do with names that contain colons, but I guess it is best if  
they comment on that. (I don't currently have access to IE, so I  
can't test what exactly happens with xlink:href.)

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/
Received on Tuesday, 16 October 2007 10:37:47 UTC