Re: SVG Semantics Re: SVG and MathML in text/html from Maciej Stachowiak on 2008-09-28 (public-html@w3.org from September 2008)

From: Maciej Stachowiak <mjs@apple.com>
Date: Sun, 28 Sep 2008 15:01:27 -0700
To: Charles McCathieNevile <chaals@opera.com>
Cc: "public-html@w3.org" <public-html@w3.org>
Message-id: <6D0E48E2-7C84-4AF5-BDB7-F80232793F58@apple.com>
On Sep 27, 2008, at 10:28 PM, Charles McCathieNevile wrote:

> On Mon, 17 Mar 2008 10:35:32 +1100, Maciej Stachowiak  
> <mjs@apple.com> wrote:
>
>> On Mar 16, 2008, at 1:40 PM, Charles McCathieNevile wrote:
>
>>> On Sat, 15 Mar 2008 22:12:34 -0700, Maciej Stachowiak  
>>> <mjs@apple.com> wrote:
>>> ...
>>>> HTML has the feature of two serializations: a classic  
>>>> serialization that is error-tolerant, and an XML-based  
>>>> serialization that has draconian error handling. These have  
>>>> different costs and benefits, ultimately it is a benefit to HTML  
>>>> authors that they have a choice. I think SVG deserves to have  
>>>> this feature as well, there's no reason it should fall short of  
>>>> HTML in this regard. Supporting SVG inline in text/html seems  
>>>> like a good opportunity to add this feature to SVG.
>>>
>>> Perhaps. The cans of worms are different though. HTML elements are  
>>> basically content - in principle, the text tree is reasonably  
>>> useful (unless you have images). SVG is about images - having  
>>> parts of an image not render can drastically alter the semantics  
>>> ofthe image.
>>>
>>> SVG has a mechanism for handling broken subtrees, which involves  
>>> showing that it is broken.
>>
>> SVG has different rules for handling different kind of syntax errors:
>>
>> - Surface-level syntax errors (missing quotes on an attribute, text  
>> encoding error, missing close tag): total failure to render the  
>> document. This is not so much an SVG rule but an XML rule inherited  
>> by SVG as part of the serialization.
>>
>> - Semantic syntax errors (bad attribute value, unknown attribute,  
>> unknown element, dangling reference to another part of the  
>> document): ignore only the erroneous construct and render the  
>> remainder of the document with best effort. SVG used to be  
>> draconian about some errors of this form of error as well, not only  
>> at parse time but even if such a state were entered into via DOM  
>> manipulation, all rendering fails, but that was abandoned.
>>
>> So <circle fill=red /> would result in the whole document not  
>> rendering (YSoD), while <circle fill="redd" /> would result in the  
>> circle not being filled (but if it has a correctly specified  
>> stroke, that would render, and indeed the rest of the document  
>> would render).
>>
>> It's true that an unknown element name outside foreignObject  
>> results in a subtree of the document that does not render, but this  
>> is not the most common form of error handling in SVG.
>>
>> In conclusion, I think considerations of image semantics do not  
>> make the case that only draconian syntax makes sense for SVG. I  
>> think both choices, draconian and tolerant, should be available to  
>> authors and tools.
>
> This conclusion seems a little simplistic given the real world cases  
> you outline. SVG has different kinds of error-handling for different  
> kinds of error, and there is not some binary difference between  
> tolerance and following draco - they are used to identify tendencies  
> on a continuum.

I'm not claiming that SVG's error handling is strictly draconian. I  
would claim that it is a hybrid. But you said that tolerant error  
handling was not a good idea for SVG because "SVG is about images -  
having parts of an image not render can drastically alter the  
semantics ofthe image." However, some aspects of SVG's error handling  
are tolerant already, and in fact have the effect of parts of the  
image not rendering. So clearly this is not a showstopper. Indeed,  
over time, SVG's error handling

Thus, the evolution of the SVG spec itself strongly implies that  
tolerant error handling is desirable for images in general, and SVG in  
particular. The only draconian error handling left in SVG is that  
inherited from XML, which defines the baseline serialization. I  
believe it is dubious to claim that the severity of XML error handling  
is intrinsic to SVG, when SVG itself abandoned such strictness at the  
SVG (rather than XML) level. Or at least, your argument for it is  
undermined by SVG itself.


>
>>> Somewhere there might be a sweet spot that we can find with the  
>>> SVG group. But it's not ust a case of "they should do as we do".  
>>> (And I agree that the luxury of having the choice of a strict  
>>> syntax is nice, and would hate to see that baby tossed out with  
>>> any bathwater we may find in the requirement to be intolerant).
>>
>> The choice of strict syntax for SVG is already available, in the  
>> XML serialization. What I'm proposing is that we add the choice of  
>> a tolerant syntax as well, and use that in the text/html  
>> serialization. Perhaps you are arguing that we should offer the  
>> option of intermixing the tolerant serialization of HTML and the  
>> draconian serialization of SVG.
>
> Yes, very roughly speaking, that is what I am suggesting (modulo the  
> idea that there are many levels of tolerance).

I don't believe you have made a good argument for why draconian error  
handling at the serialization / surface syntax level is essential for  
SVG, or why it is appropriate for the text/html serialization.

>
>
>> Maybe that is a useful option, but it seems somewhat redundant if  
>> all-tolerant and all-draconian forms of HTML+SVG are available. In  
>> theory the following four combinations are possible:
>>
>> 1) HTML: Draconian   SVG: Draconian
>> 2) HTML: Tolerant   SVG: Tolerant
>> 3) HTML: Tolerant   SVG: Draconian
>> 4) HTML: Draconian   SVG: Tolerant
>>
>> The first one is already available as XHTML+SVG. To add a tolerant  
>> syntax option for SVG, I propose that we specify a form of #2. At  
>> that point, I think #3 and #4 are too obscure to be worth adding.
>
> Except that in the real world, there is no apparent demand for a lot  
> of tolerance in SVG markup,

Evidence?

> and there is an ecosystem built on the idea that the extreme  
> tolerance available for HTML is neither necessary or desirable.

And there is an ecosystem built on the idea that the error tolerance  
of HTML is essential to the success of the Web, and an ecosystem much  
larger than either of those based on not caring much one way or the  
other but benefitting from error tolerance anyway. I would say the  
ecosystem you have mentioned is the least popular and successful of  
the three. Nontheless, HTML5 will cater to both.

> Indeed, the major failure errors in Wikipedia examples, as  
> identified by Henri, are less common than the cataclysmic failure of  
> the image to appear at all.
>
> We believe that as well as being easier to implement (in browsers  
> and authoring tools)

As a browser engine implementor, and one who has directly dealt with  
both the HTML and XML parsers in our engine, I strongly disagree that  
the SVG WG proposal is easier for browsers to implement. Using a  
single parser for HTML would be much easier than trying to switch  
between the HTML and XML parsers midstream. Is there any browser  
implementor who thinks otherwise?

I also disagree that it is any easier for authoring tools. If SVG  
authoring tools wish to directly import SVG graphics from text/html  
documents, they have to implement an HTML5 parser anyway, as described  
by Henri. I suspect that for them, too, it would be easier to stick  
with one parser for HTML instead of trying to mode-switch partway.

> the existing SVG language rather than some version of it that adds a  
> whole new set of parsing requirements, the real-world problem of  
> enabling people to hand-code rubbish isn't a serious issue in the  
> SVG world.

The phrase "enabling people to hand-code rubbish" expresses a  
judgmental point of view regarding authoring errors that I strongly  
disagree with.

> Given the relative scarcity of hand-authoring in SVG, tool coders  
> become the most important authors of code, in terms of understanding  
> the "priority of audiences" guideline that is sometimes tossed into  
> this discussion.

If tools authors would like to start round-tripping HTML that contains  
SVG, they will need an HTML5 parser and serializer, and I believe that  
for them just as for browser implementors a monolithic one will be  
easier to work with than a mode-switching one.

> A substantial proportion of SVG already seems to be moved from one  
> tool to another. Allowing a new syntax would mean breaking  
> compatibility with the existing toolset

Embedding in HTML at all will break compatibility, except in the "cut  
and paste" case, in which case existing SVG syntax will work fine for  
pasting into HTML.

> - something that doesn't seem to have any justifiable motivation  
> beyond an assertion that people will suddenly start badly hand- 
> coding complex SVG graphics en masse, despite the evidence. (Yes,  
> people do hand-code it. I hand code the majority of my own SVG, and  
> I generally get it right. Where I haven't, losing the error  
> correction of relatively strict interpretation is more than enough  
> reason for me to prefer the strict handling to be maintained. I  
> appreciate the portability of code between existing tools and  
> devices far more than I can ustify any desire to be allowed to do  
> sloppier work).

One could likewise argue that the only justification given for strict  
XML-level error handling of SVG in HTML is that it will be a very  
common use case for content authors to copy chunks of SVG in text form  
and paste them into an SVG authoring tool. I would instead expect SVG  
authoring tools to adapt and process HTML directly to extract and  
round-trip the SVG content, in which case I think a monolithic HTML  
parsing algorithm will help them.

Regards,
Maciej
Received on Sunday, 28 September 2008 22:02:15 UTC