- From: Jonas Sicking <jonas@sicking.cc>
- Date: Tue, 31 Mar 2009 20:55:28 -0700
- To: Doug Schepers <schepers@w3.org>
- Cc: public-html@w3.org, Sam Ruby <rubys@intertwingly.net>, www-svg <www-svg@w3.org>
On Mon, Mar 30, 2009 at 4:02 PM, Doug Schepers <schepers@w3.org> wrote: > > Sam Ruby wrote: >> >> Doug Schepers wrote: >>> >>> Sam Ruby wrote (on 3/25/09 3:04 PM): >>>> >>>> From what I can see, there is agreement that the desired behavior for >>>> user agents (in particular browsers) which encounter inline SVG in >>>> content served as text/html is to treat the following as identical: >>>> >>>> <svg xmlns='http://www.w3.org/2000/svg'><circle r='20'/></svg> >>>> <svg><circle r=20></svg> >>>> >>>> I don't sense that there is any remaining disagreement on this point. If >>>> I'm wrong, please correct me as that is more fundamentally important >>>> than the point I explore in the remainder of this email. >>> >>> Actually, I strongly disagree, and while I see the SVG WG doesn't seem >>> to have much choice in the matter, I think this is a huge mistake. >> >> I'd like to understand what you strongly disagree with. > > What I disagree with is the notion that we have community consensus on how > differences are resolved between SVG as XML and in text/html. I think the > responses to my email are pretty clear evidence of this, and reflect the > views of many others who've chimed in on the list previously. Let's not > gloss these views over. Sorry, I didn't mean to gloss these opinions over. I certainly know that there are people of different opinions. As I've said before, there's probably never going to be a decision that I'm 100% happy about, much less a decision that everyone is happy about. But at some point we have to make a decision and move on. I don't think we've quite made that decision yet, but it does seem like we're close to being able to. Note that Hixie writing a draft is not the same as a decision has been made. >> It seems to me >> that you strongly disagree with something that you explain below, but >> not what I said above. What do you think the two DOMs mentioned above >> should look like? > > I don't see an alternative to the DOMs being the same in that example. I > attempted, clumsily, to expand on your simple example, and detail how this > played out with a more complex (but still very simple) SVG snippet. > > >> And you *do* have a input in the matter. I'll go further and suggest >> that once I understand what you want, I'll try to help explain your >> position. > > It's not my position, it's the position of a large percentage of the people > who have responded on this matter. I think the position is pretty clear > already: the idea of well-formedness is being undervalued by the > error-correction mechanisms in the HTML5 spec. > > The SVG WG, including me, doesn't have a real problem with this error > correction (or at least, we don't have a better solution); what I have a > problem with is the implicit notion that because something *can* be > error-corrected, that the non-well-formed code is of equal value. Ah, ok, that makes it clearer. I thought your previous email was indicating that you were opposed to this error correction. I don't think anyone is exited about error correction, or thinks that it is a good solution. And I would definitely say that well-formed code is better tha non-well-formed code. Whenever there is an error the consumer is stuck between a rock and a hard place. On one hand rendering nothing and just aborting is unlikely what the consumer wants. And it's definitely not what the producer intended since then he could have just sent an empty page. But doing any error correction means basically guessing as to what the producer meant, and it's a guess we're bound to get wrong a lot of the time. The error correction for misnested tags proposed in the HTML spec is IMHO the lesser of two evils. (the two being, do error correction or don't do error correction). But it's definitely still an evil. I do however not think it brings as much problems as others have expressed concern about, for example I don't think it puts us at a disadvantage compared to technologies such as flash and silverlight, more on that below. But it's certainly not free of problems. I apologize if I gave the impression that I thought that erroneous content is not a problem that should be avoided. > The unclosed element in that example is treated as a parsing error by the > HTML5 parsing algorithm, as it should be. But other things that are equally > problematic from the standpoint of SVG's original format, XML, are not > treated in the same way, e.g. unquoted attributes. > > Many people are uncomfortable with error-correction happening at all. I'm > not asking that no error-correction take place; I'm simply asking that it be > treated as what it is: error-correction. The mechanism to do that would be > to report parsing errors, when they are corrected. What goal, or goals, are you attempting to archive by making unquoted attributes a parse error? Assuming that we can't make quotes required even on HTML elements it seems to me that the advantage with requiring quoted attributes is that people that validate their contents and ensure that it is fully conforming will have an easier time copying SVG contents out from the network stream directly into software that consumes XML-SVG. The disadvantage is that people that validate their contents and ensure that it is fully conforming will have to deal with a pretty big inconsistency between HTML element markup and SVG element markup. I'm really concerned that if we make the requirements around authoring SVG stricter than the ones around authoring HTML, then SVG will become a second class citizen in the HTML world, used by only a minority of developers. I think we can get most of the advantage of quoted attributes (copyability into XML-SVG) in two ways. First off, the validator can give recommendations for changes needed in order for svg fragments to be copyable into XML-SVG. This could include checks to make sure that there aren't stylesheets outside the SVG fragment that affect the SVG and that contents in <script>s use <![CDATA[]]> etc. This way people that want to ensure that their markup can be copied directly can get help doing so. Second, we can encourage tools that have a "view-source" feature to also allow serializing any SVG fragments in a pretty-printed fashion that is valid XML-SVG. And encourage UAs that support "save image" to write a file that is a valid XML-SVG file. This could even do things like insert stylesheets into the SVG fragment that produce the same styling as stylesheets that are otherwise outside the SVG fragment. This last thing might not be done initially in UAs, since it can be fairly complex. But I'd expect it to happen eventually. I still don't think that we should *require* this in order for a UA to be a conforming HTML5 client. But I don't see a reason why UAs that support "view-source" and "save image" wouldn't do this. I think these things would be much more efficient at solving the copy-SVG-in-HTML-into-XML-SVG problem than requiring quotes will. Mostly because requiring quotes will only affect people that run their source through a validator, which based on the state of the web seems to be a minority. I think it's more efficient to put the power ensuring that a valid XML-SVG fragment can be extracted in the hands of the people that want to extract the SVG, rather than in the hands of the people that produce the fragment. The latter has a much greater incentive to make the extraction work. >> From my understanding the main remaining disagreements centered around >> what should be considered parse errors in a validating implementation, >> and what isn't. For example if casing different from what SVG-in-XML >> uses, or unquoted attribute values, is a parse error or not. > > That's part of it. The bone of contention is how to come back from those > parse errors. As Cameron McCormack pointed out, I think the bone of contention is if these things should be parse errors or not. There seems like there is agreement that misnested tags should be a parse error and actively discouraged. My reason for believing that it should be an error is that generally there is very little reason to do so, and when it happens it's an indication that someone made a mistake. We can do a bunch of guessing as to what the author wants, but it is just that, guessing, so we're bound to get it wrong sometimes (or even often). I don't feel like the same applies to unquoted attributes. For example <script defer> is easier to read than <script defer="">. And <input type=file> seems as easy to parse without guessing as <input type="file">. The same thing goes for case insensitive tag names. It's seems strange to me to argue that tag names should be case sensitive, while at the same time would never think of introducing two elements with names only differing in what casing they use. But most importantly, it seems very confusing to me to have a markup language where some tags are parsed severely different from other tags. So unless we can make HTML require quotes, I think it would be very inconsistent to require that from SVG-in-HTML. > Implementers seems to have rejected out of hand (violently in > some cases) the notion of a UI feature to allow access to the > error-corrected resulting code (though maybe an API might be acceptable?). I didn't see anyone objecting to having a UI feature that allowed access to the error-corrected code. Though I might have missed it of course. What I, and I think others, objected to was having the spec *require* these features. Just as I would object to having the spec require access to the off-the-wire code. My experience when w3c or ietf specs in the past have required UI, the result have been bad. I'm all for putting informative language in the spec that encourage access to the error-corrected code in UAs. But I don't want to spend resources on debating and researching what the optimal UI should be. That I think is better left up to time and implementors. > Indication of error is another option, and if it's consistent, maybe the > error console is enough... but right now, it's not consistent. Note that I don't think any UA is considering reporting any parse errors. Not even misnested tags. Henri has better data, but checking for just the parse errors that are in the spec now require significantly more resources than just recovering from them. > While > unclosed elements are tagged as parse errors, unquoted attributes are not; > someone made the argument that since browsers *can* recover from unquoted > attributes, it shouldn't be a parse error, but that same argument can be > made for unclosed attributes, or understood <html:tbody> elements, or any > other markup convention that is error-corrected when it's put into the DOM. > At what point does "parse error" mean anything at all? I'm not sure I agree with you here. Unclosed attributes we can't and in fact don't recover from. For example ...<input value="should end here> label here <table><tr><td>and here is a table</tr></td></table> in this case, everything from the "should end here" until the end of the file (assuming it ends after the table) will be put inside the @value attribute on the input element. As for understood tbody elements. I'm definitely not a fan of the implicit tags in HTML. But it's something that HTML4 says is valid and so it's hard to make HTML5 claim it is invalid. So it's not a parse error and so there is technically no error-correction happening. I really do think that unquoted attributes are pretty easy to "recover" from as the syntax isn't ambiguous. It has after all been supported in HTML4 for a long time. >>> I'm not trying to be negative. I genuinely hope that this helps spread >>> SVG. >>> But I fear that it will have exactly the opposite effect. Languages >>> which >>> are more tightly structured and tightly controlled (such as Adobe Air >>> and MS >>> Silverlight), but which have supporting infrastructures of authoring >>> tools, >>> commercial promotion, and aggressive distribution networks will have the >>> advantage that they are dependable and intuitively predictable, while >>> this >>> makes SVG less intuitively predictable, and failing that infrastructure, >>> makes it less attractive. >> >> This particular downside of not using more fatal error handling I am >> not worried about at all. >> >> The work flow that Flash/Air/Silverlight offers is something like this: >> >> * Use a tool to create content. This tool is often proprietory. >> * Use a tool to open other peoples content to see how they did it. >> >> This exact behavior works just as well with HTML and tag soup. As long >> as you use tools, it does not matter how un-well-formed your content >> is, as long as all tools parse it into the same DOM. In fact, as long >> as you are using tooling, as a user you don't even need to notice that >> the content you opened in your tool is riddled with parse errors. > > I'm neither supporting nor rejecting the idea that "tools will save us". > But the fact of the matter is that we do have to take existing tools into > account. > > Right now, there are authoring tools and SVG viewers that could read SVG > "excerpts" (that is, inline SVG fragments that are removed from their HTML > markup) if the content is in the canonical XML format. Maybe in the future, > Inkscape, Illustrator, CorelDraw, and BitFlash, Ikivo, and Renesis viewers > could read the more lax syntax proposed. In the meantime, we could rely on > tools that extract and error-correct the SVG code. But where are the rules > for those conversion tools to know how to do the error-correction? Isn't > that a class of UA that also needs to be taken into account for the HTML5 > spec, if we're talking about something that is meant to fit into a large > ecosystem? > > If so, it seems pretty clear to me that HTML5 should make all non-XML > conventions used for XML-original code as parse errors, for at least some > class of UA. I think we have the same goal, but we're taking two different paths to reaching that goal. The goal is to allow for SVG fragments to be extractable from HTML documents and edited in existing tools. The path you are arguing for, if I understand it correctly, is to encourage people to write SVG-in-HTML that also is valid XML-SVG. The path that I am arguing for, is to allow the receiver of SVG-in-HTML to easily extract valid XML-SVG. The problem I see with your suggested path is twofold. First of all I don't think declaring things parse errors and invalid HTML is going to have much of an effect. Data indicates that very few people bother to validate their markup. I forget the exact number, but the percentage of pages on the web that is valid HTML is very low. I see no reason this would be different for SVG-in-HTML. Especially if we have stricter rules for SVG-in-HTML than we do for HTML. Second, it comes at a cost for publishers of content since it means that within a single HTML document you'll have two vastly different rules for what's valid and what's not. This especially punishes the people that actually validate their contents. For example it would mean that we couldn't use the same rules for parsing SVG scripts as for HTML scripts. There's also the fact that many sites don't *want* users to be able to extract contents from their site. There's lots of sites that use tricks that disable the context menu in order to prevent people from using the "view-source" options in the context menu. This is off course fairly ineffective since the same menu item exists in the normal menus in most browsers, but that doesn't prevent sites from doing all they can still. Such a site has every incentive to purposly make their SVG-in-HTML fragments non valid XML-SVG. By using the path I propose of making it the UAs responsibility to allow its users to extract valid XML-SVG out of SVG-in-HTML fragments you work around these issues. I agree you loose the ability to copy-n-paste out of the normal view-source that exists today, but I think the cost there is worth the benefit of having more consistent syntax throughout HTML. I also think browsers will eventually create much better tools that the raw off-the-network view-source in order to better support SVG editing. It's a competitive advantage for a browser to allow its users to extract valid XML-SVG, so I have no doubt that they will do this. > Or is HTML5 just a desktop browser spec? Can we *please* get out of this argument? At this point this argument actually feels quite inflamatory. Do you really think anyone is arguing this? First of all, with every single major browser engine out there now running on mobile platforms, optimizing for just desktop browsers would make no sense at all. Second, tooling is something that I think would help every actor that cares about HTML. The fact that flash has awesome tools is possibly the most significant reason it is used as much as it is. In order for HTML to compete we need better tools. >> And of course there are downsides of requiring well formed content, >> such as making using a text editor or server side scripts to create >> content less likely to produce pages that don't render as intended. > > I'm not sure what you mean by this. I just mean that there are downsides with having strict conformance criterias such as quoted attributes or correct casing. Which is that it makes it harder for people that use string technology to produce the SVG content. I.e. people that use a text editor (I.e. editing a long string) or that use string concatenation to dynamically generate content, such as PHP, ASP.net, etc. / Jonas
Received on Wednesday, 1 April 2009 03:56:21 UTC