Error handling and legacy content

I have a hard time understanding what people actually have against  
handling legacy content. This doesn't affect authors. Authors are  
currently not allowed to write "<em> <strong> </em>" and HTML5 will not  
allow them to that either. (Also, to be clear, HTML5 defines what authors,  
user agents, scripting enabled user agents, scripting disabled user  
agents, interactive user agents, non-interactive user agents, etc. have to  
do. It's certainly not just "desktop browsers".)

This is about user agents. User agents have indicated that when facing  
content which should be X they also want to know what they should do when  
it is Y. For instance, when they encounter "<em> <strong> </em>" or <html  
foo-bar="123">. Defining this in a way that is compatible with the legacy  
content out there has the advantage that the overall HTML language stays  
relatively simple. This is important for new user agents entering the  
market for instance. But also important if we want to read that content  
decades from now.

Another option is introducing a new version for HTML5 and defining the  
"legacy HTML" in a separate specification. I don't think there is enough  
weird quirks to warrant that. Maybe for Internet Explorer, but I don't  
think that's true for the other user agents and new user agents entering  
the market. Having two versions of HTML besides a specific set of  
rendering quirks for the older one introduces a lot of cost in terms of  
browser maintenance and teaching authors what is different. (It's also not  
entirely clear how this would work as we would have to implement a new  
version and fix the older version as well to get interoperability there.)

Yet another option is leaving it undefined what happens when you face  
non-conforming content. You could crash, you could do the same as what you  
did for the previous version of HTML, stop rendering, etc. If you don't  
step away from that soon you effectively kill the language I think, unless  
user agents reverse engineer the market leader (as history has shown for  
HTML, CSS, SVG). Because when you don't define what happens with  
non-conforming content it becomes really hard to introduce extensions to  
the language. Introducing versioning as solution for the extensibility  
creates the problem that it's unclear what new content will do in older  
versions.


What would help me a lot I think is people proposing specifically what  
they would like and how they would solve the issues. To me it seems that  
some people just argue against things which may initially sound like a bad  
idea, without actually thinking about why it was done that way. Believe it  
or not, a lot of the stuff in the WHATWG drafts has a pretty solid  
rationale and wasn't just some idea written down on a Sunday afternoon.


-- 
Anne van Kesteren
<http://annevankesteren.nl/>
<http://www.opera.com/>

Received on Wednesday, 2 May 2007 08:18:57 UTC