W3C home > Mailing lists > Public > www-tag@w3.org > January 2012

Re: Opera reparses as HTML when XML parse fails

From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
Date: Mon, 2 Jan 2012 06:03:59 +0100
To: Karl Dubost <karld@opera.com>
Cc: Henri Sivonen <hsivonen@iki.fi>, Noah Mendelsohn <nrm@arcanedomain.com>, www-tag@w3.org
Message-ID: <20120102060359709765.406fb505@xn--mlform-iua.no>
Karl Dubost, on Wed, 14 Dec 2011 00:32:29 -0500 replied:
13 December 2011 at 22:55, Noah Mendelsohn:
>> … to those wrestling with HTML/XML interop: […]
>> Opera is, at least in some cases, punting back to a forgiving
>> HTML parse when strict XML rules result in an error.

> Note that it is an issue which is happening in other configurations 
> with HTTP headers, Javascript user agent sniffing, CSS vendor 
> extensions, etc. 

Karl, that assessment seems quite accurate. Unlike many other 
statements from Opera on this issue, which have left the impression 
that Opera wants to turn this example into a poster child for the bad 
effects of XML draconian errors. Rather than tackling the problem that 
you describe above: Sniffing.

Sniffing is a User-Agent string issue. In other words: The issue could 
have been solved by "updating" the user agent string to something that 
doesn't trigger the effect. For XML, you have a an "elegant" solution: 
just "fall back" to HTML. But what do you do for the other cases you 
mention?

Karl Dubost, on Wed, 14 Dec 2011 00:32:29 -0500 also replied:
> I'm partly behind that decision […]

> 3. Here you have Open The Web people at Opera who go "wtf?". […]
> 5. We spent a lot of time and energy to try to solve this 
>    issue by contacting people, by trying to change things. […]

What about time & energy within House-of-Opera on the User-Agent string?

In that regard, in 2011, Webkit changed its User-Agent string, partly 
to "increase compatibility with Internet Explorer". And it seems that 
Mozilla's UA string change, were motivated by similar issues. [2] I 
note in particular that Mozilla they explicitly say that not only 
Firefox will include the "Firefox/*" string, but also other products 
based on Gecko. And a quick look shows that Webkit browses follow the 
same praxis - just consider how Chrome includes the "Safari/*" string 
too.[3] No wonder: Compatibility with the Web often seems to mean 
compatibility with IE, and compatibility with the Web seems like more 
important than insisting excluding strings that refer to a competing 
product.

In that regard, another impression spread by Opera on this issue, is 
that it somehow is singled out by ASP, as if ASP tries to be mean to 
exactly Opera.  However, more than 10.000 User-Agent strings have been 
documented. [4] And more than 81% of those strings follow the the 
unofficial User-Agent format.[5] Meaning, as minimum, that all the 
strings in the 81% group starts with the string "Mozilla/\d.\d" (in a 
few cases, they merely include that string somewhere else). Opera, 
however, for some reason is not amongst the 19% which start nor include 
the "Mozilla/\d.\d" string.

THUS: Even if Opera had added "Firefox/0" or "MSIE 0.0" or "Safari/0", 
in order to please ASP, it would not have worked, unless you also had 
added "Mozilla/0.0" to that string, somewhere. 

The goal of the HTML5 effort has been described as defining HTML in 
such a way that anyone could build a new Web browser. And, in that 
regard: Opera has now created the impression that it is necessary to 
fallback to HTML in order to avoid XML fatal errors. Would it not have 
better for the open Web, if the Open The Web team used its time an 
energy on the de-facto User-Agent string requirements? 

This is what ASP "requires" in order to not send the page as XML:

 (1) String 'Mozilla/\d.\d' somewhere.
 (2) String "Safari/\d\+" OR "Firefox/\d\+" OR "MISIE 0.0" somewhere.
     (No wonder that Mozilla and Webkit recommend those strings to be 
     used in any product that is based on Gecko or Webkit.)
 (3) When the first token is not 'Mozilla/\d\+.\d\+', then it must
     have the form "Word/\d+". Thus "Word/\d\.\d\d",
     as Opera now uses (Opera/9.80), does not work.

Based on the above, I made this user-agent string, which works nicely:

Opera/11 Version/11.60 Presto/2.2.15 (boilerplate:Mozilla/5.0/msie 10.0)

DEMO: Visit <http://home.mcafee.com/Default.aspx> with a browser (such 
as iCab) that allows you to easily change the UA string to see.

NOTE: The "required" tokens can be made longer [and thus obfuscating 
what they 'mean'] by adding characters in front of them - like 
"FooMozilla", "FooMSIE", "FooSafari", and I took modest advantage of 
that in the boilerplate part of the string, by writing "/msie".

CONLUSION: Here we have 4 other vendors - Apple, Google, Mozilla and 
Microsoft - which have formed their User-Agent strings according to the 
terrain, in order (probably) to make them work a smoothly as possible 
for the users. If they wanted, they could instead insisted on their 
right to freely pick the user-agent string, and used that as a 
justification for parsing XML as HTML. I really wonder why Opera must 
follow another approach. If Opera wants to distinguish itself from the 
rest, then seeking to document how user-agent strings must look in 
order to work optimally, would have been better thing to do. Perhaps we 
will see that happen - and this new change reverted? If a new UA string 
doesn't it to happen, then I guess the behaviour would not be needed 
anymore? At least, not any more than it is needed in the other browsers.

PS: Opera's decision has created reactions amongst dedicated Opera 
users as well. [6]

[1] 
http://www.webkit.org/blog/1580/user-agent-string-changes-on-webkit-trunk/

[2] 
http://hacks.mozilla.org/2010/09/final-user-agent-string-for-firefox-4/

[3] http://www.useragentstring.com/pages/Chrome/

[4] http://www.useragentstring.com/pages/All/

[5] http://www.useragentstring.com/pages/Opera/

[6] 
http://my.opera.com/community/forums/topic.dml?id=1204542&t=1325478931

-- 
Leif Halvard Silli
Received on Monday, 2 January 2012 05:04:35 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 26 April 2012 12:48:44 GMT