Re: SVG Semantics Re: SVG and MathML in text/html from Maciej Stachowiak on 2008-09-30 (public-html@w3.org from September 2008)

From: Maciej Stachowiak <mjs@apple.com>
Date: Mon, 29 Sep 2008 17:30:17 -0700
To: Charles McCathieNevile <chaals@opera.com>
Cc: "public-html@w3.org" <public-html@w3.org>
Message-id: <C8ED267B-BC88-4AFA-AE32-73AAA7D7FA56@apple.com>
On Sep 29, 2008, at 4:27 PM, Charles McCathieNevile wrote:

> On Mon, 29 Sep 2008 08:01:27 +1000, Maciej Stachowiak  
> <mjs@apple.com> wrote:
>
>> But you said that tolerant error handling was not a good idea for  
>> SVG because "SVG is about images - having parts of an image not  
>> render can drastically alter the semantics ofthe image."
>
> Did I? My position is that introducing divergent error handling to  
> SVG is a problem because the semantics of the image changing can  
> have serious effects.

Has the divergent error handling between different versions of SVG  
caused the kinds of problems you are worried about?

>> Indeed, over time, SVG's error handling
>
> I assume you mneant to say something like "...has got looser". That  
> would be simplifying the case to the point of distortion, since  
> while the way it handles certain errors in SVG code has become  
> looser, some XML-level errors that were permitted in ASV have not  
> been permitted in Firefox, and Opera has made its handling of those  
> errors progressively stricter.

I'm talking about the spec, not implementations. The SVG spec has  
changed from stricter to looser, and I cannot think of any instance  
where it got more strict. Thus it is hard to take seriously an  
argument that any additional spec-level looseness would create serious  
problems.

> The severity of error handling at XML level has been valuable in  
> simplifying the parsing of SVG, and I assert in ensuring  
> interoperability - unlike HTML there is relatively little invalid  
> content, and specific errors are slowly weeded out of the corpus by  
> progressive tightening of the requirements. This has meant that  
> authors learn to write correct code, which means there has been no  
> ongoing need or desire for everyone to further complicate their SVG  
> tools.

Correct XML parsing is not simpler than correct HTML parsing (at least  
comparing the amounts of code required to do each correctly).

>
>>>> Perhaps you are arguing that we should offer the option of  
>>>> intermixing the tolerant serialization of HTML and the draconian  
>>>> serialization of SVG.
>
>>> Yes, very roughly speaking, that is what I am suggesting (modulo  
>>> the idea that there are many levels of tolerance).
>>
>> I don't believe you have made a good argument for why draconian  
>> error handling at the serialization / surface syntax level is  
>> essential for SVG, or why it is appropriate for the text/html  
>> serialization.
>
> And I don't believe that I have seen a convincing argument for  
> changing the existing form of SVG to require the more permissive and  
> complex HTML-style parsing.

At least the following arguments have been made:

1) Having both tolerant and draconian surface syntax forms available  
has been beneficial to HTML. It would also be a benefit to SVG to  
provide the same feature.
2) Introducing islands of draconian parsing into the otherwise  
tolerant text/html is confusing.
3) Attempting draconian parsing of embedded SVG may in fact break some  
existing content (see relevant studies from Hixie).
4) Using a single parser for HTML+SVG instead of mode-switching is  
arguably simpler to implement (feedback from at least some  
implementors).

You may not find them convincing but they are at least prima facie  
rational. I do not see how it can be argued that XML-level strictness  
is critical to the SVG ecosystem when SVG-level strictness proved not  
to be, and was abandoned. At the very least I would expect someone to  
explain how the two cases are different.

>
>>> Except that in the real world, there is no apparent demand for a  
>>> lot of tolerance in SVG markup,
>>
>> Evidence?
>
> Evidence that there is not a lot of demand? The fact that our  
> customers who insist on SVG don't even mention it (and only ever  
> mentioned it for a few very specific errors they were sucked into by  
> building for specific tools).

How many of these customers are insisting on SVG in text/html?

> The fact that it is a tiny topic in SVG community lists, and even  
> where people who come from HTML or similar make a mistake and need  
> to learn "the SVG way" it doesn't give rise to campaigns for looser  
> handling. The fact that tools that have been more tolerant than the  
> spec have become less so in significant ways.

But the spec has become more tolerant.

> The only kind of evidence I have seen that there is a demand for  
> seriously changing the SVG syntax is a proposal and discussion in  
> this context, and it appears that the community of people who  
> produce and use SVG are not that interested in the extremely  
> tolerant HTML model
>
>>> and there is an ecosystem built on the idea that the extreme  
>>> tolerance available for HTML is neither necessary or desirable.
>>
>> And there is an ecosystem built on the idea that the error  
>> tolerance of HTML is essential to the success of the Web,
>
> That would be the ecosystem of HTML, and while I agree that  
> tolerance in HTML is essential to the ongoing success of the web, I  
> am less convinced that it was tolerance itself that led to that  
> success.
>
>> and an ecosystem much larger than either of those based on not  
>> caring much one way or the other but benefitting from error  
>> tolerance anyway.
>
> I'm not sure where the evidence is that this ecosystem benefits from  
> error tolerance, and I believe that there are costs to those  
> benefits, which should be weighed.

The availability of both XHTML and HTML lets authors make the choice  
for themselves (subject to the limits of implementation support). But  
you are arguing that for SVG, they should not be given the choice.

>
>
>> I would say the ecosystem you have mentioned is the least popular  
>> and successful of the three. Nontheless, HTML5 will cater to both.
>>
>>> Indeed, the major failure errors in Wikipedia examples, as  
>>> identified by Henri, are less common than the cataclysmic failure  
>>> of the image to appear at all.
>>>
>>> We believe that as well as being easier to implement (in browsers  
>>> and authoring tools)
>>
>> As a browser engine implementor, and one who has directly dealt  
>> with both the HTML and XML parsers in our engine, I strongly  
>> disagree that the SVG WG proposal is easier for browsers to  
>> implement. Using a single parser for HTML would be much easier than  
>> trying to switch between the HTML and XML parsers midstream. Is  
>> there any browser implementor who thinks otherwise?
>
> Yes. I am not making this up for myself, I am reporting the opinion  
> of the people who build the Core of Opera and in particular those  
> responsible for dealing with parsing HTML, XML and SVG and making  
> them actually work. A simple part of the argument is that multiple  
> parsers that pass stuff around are already part of a browser (at  
> least HTML, XML, CSS, and Javascript are common to more or less all  
> browsers),

Embedded JavaScript and CSS are very different, in that the boundaries  
can be found by the HTML or XML parser without having to invoke the  
parser for the embedded language incrementally. The SVG WG's proposal  
does not have this property.

>> I also disagree that it is any easier for authoring tools. If SVG  
>> authoring tools wish to directly import SVG graphics from text/html  
>> documents, they have to implement an HTML5 parser anyway, as  
>> described by Henri. I suspect that for them, too, it would be  
>> easier to stick with one parser for HTML instead of trying to mode- 
>> switch partway.
>
> No they don't. Under either proposal, unless they also want to  
> handle HTML, they can do a very simple text extraction.

Sure, if they don't care about correctness. But ultimately that is  
about as sound as using regexps to parse XML. Which is to say, it  
seems to mostly work but is totally wrong.

>
>>> Given the relative scarcity of hand-authoring in SVG, tool coders  
>>> become the most important authors of code, in terms of  
>>> understanding the "priority of audiences" guideline that is  
>>> sometimes tossed into this discussion.
>>
>> If tools authors would like to start round-tripping HTML that  
>> contains SVG, they will need an HTML5 parser and serializer, and I  
>> believe that for them just as for browser implementors a monolithic  
>> one will be easier to work with than a mode-switching one.
>
> This relies on the assumption that the tools need to handle both  
> types of content themselves. I don't see how that is a valid  
> assumption.

I am assuming tools vendors wish to provide a high-quality experience  
for content authors, and if SVG-in-HTML becomes a popular use case,  
they will want to serve that use case well. I could be wrong, but that  
is usually how tools vendors respond to popular new technologies. If  
the proposal is based on the premise that they will not do so in this  
case, then that strikes me as a weak argument.

Regards,
Maciej
Received on Tuesday, 30 September 2008 00:30:59 UTC