Re: SVG in text/html from Jonas Sicking on 2009-04-01 (www-svg@w3.org from April 2009)

From: Jonas Sicking <jonas@sicking.cc>
Date: Tue, 31 Mar 2009 20:55:28 -0700
To: Doug Schepers <schepers@w3.org>
Cc: public-html@w3.org, Sam Ruby <rubys@intertwingly.net>, www-svg <www-svg@w3.org>
Message-ID: <63df84f0903312055l584e6617s9a4fda3016c47d56@mail.gmail.com>
On Mon, Mar 30, 2009 at 4:02 PM, Doug Schepers <schepers@w3.org> wrote:
>
> Sam Ruby wrote:
>>
>> Doug Schepers wrote:
>>>
>>> Sam Ruby wrote (on 3/25/09 3:04 PM):
>>>>
>>>> From what I can see, there is agreement that the desired behavior for
>>>> user agents (in particular browsers) which encounter inline SVG in
>>>> content served as text/html is to treat the following as identical:
>>>>
>>>> <svg xmlns='http://www.w3.org/2000/svg'><circle r='20'/></svg>
>>>> <svg><circle r=20></svg>
>>>>
>>>> I don't sense that there is any remaining disagreement on this point. If
>>>> I'm wrong, please correct me as that is more fundamentally important
>>>> than the point I explore in the remainder of this email.
>>>
>>> Actually, I strongly disagree, and while I see the SVG WG doesn't seem
>>> to have much choice in the matter, I think this is a huge mistake.
>>
>> I'd like to understand what you strongly disagree with.
>
> What I disagree with is the notion that we have community consensus on how
> differences are resolved between SVG as XML and in text/html.  I think the
> responses to my email are pretty clear evidence of this, and reflect the
> views of many others who've chimed in on the list previously.  Let's not
> gloss these views over.

Sorry, I didn't mean to gloss these opinions over. I certainly know
that there are people of different opinions. As I've said before,
there's probably never going to be a decision that I'm 100% happy
about, much less a decision that everyone is happy about. But at some
point we have to make a decision and move on. I don't think we've
quite made that decision yet, but it does seem like we're close to
being able to.

Note that Hixie writing a draft is not the same as a decision has been made.

>> It seems to me
>> that you strongly disagree with something that you explain below, but
>> not what I said above. What do you think the two DOMs mentioned above
>> should look like?
>
> I don't see an alternative to the DOMs being the same in that example. I
> attempted, clumsily, to expand on your simple example, and detail how this
> played out with a more complex (but still very simple) SVG snippet.
>
>
>> And you *do* have a input in the matter. I'll go further and suggest
>> that once I understand what you want, I'll try to help explain your
>> position.
>
> It's not my position, it's the position of a large percentage of the people
> who have responded on this matter.  I think the position is pretty clear
> already: the idea of well-formedness is being undervalued by the
> error-correction mechanisms in the HTML5 spec.
>
> The SVG WG, including me, doesn't have a real problem with this error
> correction (or at least, we don't have a better solution); what I have a
> problem with is the implicit notion that because something *can* be
> error-corrected, that the non-well-formed code is of equal value.

Ah, ok, that makes it clearer. I thought your previous email was
indicating that you were opposed to this error correction.

I don't think anyone is exited about error correction, or thinks that
it is a good solution. And I would definitely say that well-formed
code is better tha non-well-formed code. Whenever there is an error
the consumer is stuck between a rock and a hard place. On one hand
rendering nothing and just aborting is unlikely what the consumer
wants. And it's definitely not what the producer intended since then
he could have just sent an empty page. But doing any error correction
means basically guessing as to what the producer meant, and it's a
guess we're bound to get wrong a lot of the time.

The error correction for misnested tags proposed in the HTML spec is
IMHO the lesser of two evils. (the two being, do error correction or
don't do error correction). But it's definitely still an evil.

I do however not think it brings as much problems as others have
expressed concern about, for example I don't think it puts us at a
disadvantage compared to technologies such as flash and silverlight,
more on that below. But it's certainly not free of problems. I
apologize if I gave the impression that I thought that erroneous
content is not a problem that should be avoided.


> The unclosed element in that example is treated as a parsing error by the
> HTML5 parsing algorithm, as it should be.  But other things that are equally
> problematic from the standpoint of SVG's original format, XML, are not
> treated in the same way, e.g. unquoted attributes.
>
> Many people are uncomfortable with error-correction happening at all. I'm
> not asking that no error-correction take place; I'm simply asking that it be
> treated as what it is: error-correction.  The mechanism to do that would be
> to report parsing errors, when they are corrected.

What goal, or goals, are you attempting to archive by making unquoted
attributes a parse error?

Assuming that we can't make quotes required even on HTML elements it
seems to me that the advantage with requiring quoted attributes is
that people that validate their contents and ensure that it is fully
conforming will have an easier time copying SVG contents out from the
network stream directly into software that consumes XML-SVG.

The disadvantage is that people that validate their contents and
ensure that it is fully conforming will have to deal with a pretty big
inconsistency between HTML element markup and SVG element markup. I'm
really concerned that if we make the requirements around authoring SVG
stricter than the ones around authoring HTML, then SVG will become a
second class citizen in the HTML world, used by only a minority of
developers.

I think we can get most of the advantage of quoted attributes
(copyability into XML-SVG) in two ways. First off, the validator can
give recommendations for changes needed in order for svg fragments to
be copyable into XML-SVG. This could include checks to make sure that
there aren't stylesheets outside the SVG fragment that affect the SVG
and that contents in <script>s use <![CDATA[]]> etc. This way people
that want to ensure that their markup can be copied directly can get
help doing so.

Second, we can encourage tools that have a "view-source" feature to
also allow serializing any SVG fragments in a pretty-printed fashion
that is valid XML-SVG. And encourage UAs that support "save image" to
write a file that is a valid XML-SVG file. This could even do things
like insert stylesheets into the SVG fragment that produce the same
styling as stylesheets that are otherwise outside the SVG fragment.
This last thing might not be done initially in UAs, since it can be
fairly complex. But I'd expect it to happen eventually.

I still don't think that we should *require* this in order for a UA to
be a conforming HTML5 client. But I don't see a reason why UAs that
support "view-source" and "save image" wouldn't do this.

I think these things would be much more efficient at solving the
copy-SVG-in-HTML-into-XML-SVG problem than requiring quotes will.
Mostly because requiring quotes will only affect people that run their
source through a validator, which based on the state of the web seems
to be a minority. I think it's more efficient to put the power
ensuring that a valid XML-SVG fragment can be extracted in the hands
of the people that want to extract the SVG, rather than in the hands
of the people that produce the fragment. The latter has a much greater
incentive to make the extraction work.

>> From my understanding the main remaining disagreements centered around
>> what should be considered parse errors in a validating implementation,
>> and what isn't. For example if casing different from what SVG-in-XML
>> uses, or unquoted attribute values, is a parse error or not.
>
> That's part of it.  The bone of contention is how to come back from those
> parse errors.

As Cameron McCormack pointed out, I think the bone of contention is if
these things should be parse errors or not.

There seems like there is agreement that misnested tags should be a
parse error and actively discouraged. My reason for believing that it
should be an error is that generally there is very little reason to do
so, and when it happens it's an indication that someone made a
mistake. We can do a bunch of guessing as to what the author wants,
but it is just that, guessing, so we're bound to get it wrong
sometimes (or even often).

I don't feel like the same applies to unquoted attributes. For example
<script defer> is easier to read than <script defer="">. And <input
type=file> seems as easy to parse without guessing as <input
type="file">.

The same thing goes for case insensitive tag names. It's seems strange
to me to argue that tag names should be case sensitive, while at the
same time would never think of introducing two elements with names
only differing in what casing they use.

But most importantly, it seems very confusing to me to have a markup
language where some tags are parsed severely different from other
tags. So unless we can make HTML require quotes, I think it would be
very inconsistent to require that from SVG-in-HTML.

> Implementers seems to have rejected out of hand (violently in
> some cases) the notion of a UI feature to allow access to the
> error-corrected resulting code (though maybe an API might be acceptable?).

I didn't see anyone objecting to having a UI feature that allowed
access to the error-corrected code. Though I might have missed it of
course.

What I, and I think others, objected to was having the spec *require*
these features. Just as I would object to having the spec require
access to the off-the-wire code. My experience when w3c or ietf specs
in the past have required UI, the result have been bad.

I'm all for putting informative language in the spec that encourage
access to the error-corrected code in UAs. But I don't want to spend
resources on debating and researching what the optimal UI should be.
That I think is better left up to time and implementors.

>  Indication of error is another option, and if it's consistent, maybe the
> error console is enough... but right now, it's not consistent.

Note that I don't think any UA is considering reporting any parse
errors. Not even misnested tags. Henri has better data, but checking
for just the parse errors that are in the spec now require
significantly more resources than just recovering from them.

>  While
> unclosed elements are tagged as parse errors, unquoted attributes are not;
> someone made the argument that since browsers *can* recover from unquoted
> attributes, it shouldn't be a parse error, but that same argument can be
> made for unclosed attributes, or understood <html:tbody> elements, or any
> other markup convention that is error-corrected when it's put into the DOM.
>  At what point does "parse error" mean anything at all?

I'm not sure I agree with you here. Unclosed attributes we can't and
in fact don't recover from. For example

...<input value="should end here> label here <table><tr><td>and here
is a table</tr></td></table>

in this case, everything from the "should end here" until the end of
the file (assuming it ends after the table) will be put inside the
@value attribute on the input element.

As for understood tbody elements. I'm definitely not a fan of the
implicit tags in HTML. But it's something that HTML4 says is valid and
so it's hard to make HTML5 claim it is invalid. So it's not a parse
error and so there is technically no error-correction happening.

I really do think that unquoted attributes are pretty easy to
"recover" from as the syntax isn't ambiguous. It has after all been
supported in HTML4 for a long time.


>>>  I'm not trying to be negative.  I genuinely hope that this helps spread
>>> SVG.
>>>   But I fear that it will have exactly the opposite effect. Languages
>>> which
>>>  are more tightly structured and tightly controlled (such as Adobe Air
>>> and MS
>>>  Silverlight), but which have supporting infrastructures of authoring
>>> tools,
>>>  commercial promotion, and aggressive distribution networks will have the
>>>  advantage that they are dependable and intuitively predictable, while
>>> this
>>>  makes SVG less intuitively predictable, and failing that infrastructure,
>>>  makes it less attractive.
>>
>> This particular downside of not using more fatal error handling I am
>> not worried about at all.
>>
>> The work flow that Flash/Air/Silverlight offers is something like this:
>>
>> * Use a tool to create content. This tool is often proprietory.
>> * Use a tool to open other peoples content to see how they did it.
>>
>> This exact behavior works just as well with HTML and tag soup. As long
>> as you use tools, it does not matter how un-well-formed your content
>> is, as long as all tools parse it into the same DOM. In fact, as long
>> as you are using tooling, as a user you don't even need to notice that
>> the content you opened in your tool is riddled with parse errors.
>
> I'm neither supporting nor rejecting the idea that "tools will save us".
>  But the fact of the matter is that we do have to take existing tools into
> account.
>
> Right now, there are authoring tools and SVG viewers that could read SVG
> "excerpts" (that is, inline SVG fragments that are removed from their HTML
> markup) if the content is in the canonical XML format.  Maybe in the future,
> Inkscape, Illustrator, CorelDraw, and BitFlash, Ikivo, and Renesis viewers
> could read the more lax syntax proposed.  In the meantime, we could rely on
> tools that extract and error-correct the SVG code.  But where are the rules
> for those conversion tools to know how to do the error-correction?  Isn't
> that a class of UA that also needs to be taken into account for the HTML5
> spec, if we're talking about something that is meant to fit into a large
> ecosystem?
>
> If so, it seems pretty clear to me that HTML5 should make all non-XML
> conventions used for XML-original code as parse errors, for at least some
> class of UA.

I think we have the same goal, but we're taking two different paths to
reaching that goal.

The goal is to allow for SVG fragments to be extractable from HTML
documents and edited in existing tools.

The path you are arguing for, if I understand it correctly, is to
encourage people to write SVG-in-HTML that also is valid XML-SVG.

The path that I am arguing for, is to allow the receiver of
SVG-in-HTML to easily extract valid XML-SVG.

The problem I see with your suggested path is twofold. First of all I
don't think declaring things parse errors and invalid HTML is going to
have much of an effect. Data indicates that very few people bother to
validate their markup. I forget the exact number, but the percentage
of pages on the web that is valid HTML is very low. I see no reason
this would be different for SVG-in-HTML. Especially if we have
stricter rules for SVG-in-HTML than we do for HTML.

Second, it comes at a cost for publishers of content since it means
that within a single HTML document you'll have two vastly different
rules for what's valid and what's not. This especially punishes the
people that actually validate their contents. For example it would
mean that we couldn't use the same rules for parsing SVG scripts as
for HTML scripts.

There's also the fact that many sites don't *want* users to be able to
extract contents from their site. There's lots of sites that use
tricks that disable the context menu in order to prevent people from
using the "view-source" options in the context menu. This is off
course fairly ineffective since the same menu item exists in the
normal menus in most browsers, but that doesn't prevent sites from
doing all they can still. Such a site has every incentive to purposly
make their SVG-in-HTML fragments non valid XML-SVG.


By using the path I propose of making it the UAs responsibility to
allow its users to extract valid XML-SVG out of SVG-in-HTML fragments
you work around these issues. I agree you loose the ability to
copy-n-paste out of the normal view-source that exists today, but I
think the cost there is worth the benefit of having more consistent
syntax throughout HTML. I also think browsers will eventually create
much better tools that the raw off-the-network view-source in order to
better support SVG editing. It's a competitive advantage for a browser
to allow its users to extract valid XML-SVG, so I have no doubt that
they will do this.

> Or is HTML5 just a desktop browser spec?

Can we *please* get out of this argument? At this point this argument
actually feels quite inflamatory. Do you really think anyone is
arguing this?

First of all, with every single major browser engine out there now
running on mobile platforms, optimizing for just desktop browsers
would make no sense at all.

Second, tooling is something that I think would help every actor that
cares about HTML. The fact that flash has awesome tools is possibly
the most significant reason it is used as much as it is. In order for
HTML to compete we need better tools.

>> And of course there are downsides of requiring well formed content,
>> such as making using a text editor or server side scripts to create
>> content less likely to produce pages that don't render as intended.
>
> I'm not sure what you mean by this.

I just mean that there are downsides with having strict conformance
criterias such as quoted attributes or correct casing. Which is that
it makes it harder for people that use string technology to produce
the SVG content. I.e. people that use a text editor (I.e. editing a
long string) or that use string concatenation to dynamically generate
content, such as PHP, ASP.net, etc.

/ Jonas
Received on Wednesday, 1 April 2009 03:56:23 UTC