W3C home > Mailing lists > Public > www-tag@w3.org > February 2009

Re: HTML and XML

From: Henri Sivonen <hsivonen@iki.fi>
Date: Wed, 11 Feb 2009 11:35:01 +0200
Cc: "Anne van Kesteren" <annevk@opera.com>, "David Orchard" <orchard@pacificspirit.com>, www-tag@w3.org
Message-Id: <B755BD77-84F0-4A01-8A80-0689553AD478@iki.fi>
To: Henry S.Thompson <ht@inf.ed.ac.uk>

On Feb 10, 2009, at 22:26, Henry S. Thompson wrote:

> And there's good reason for that:  XML actually _is_ usable by
> authors and authoring well-formed XML is _not_ hard.

However, writing XML-outputting software whose output is always well- 
formed even in the case of malicious input is hard.

> b) points to a piece of broken _software_;
[..]
> one article that points to a page in which someone trying to  
> introduce an _intentional_ markup error made the wrong error.

It is a pretty significant problem if an attacker can intentionally  
introduce a markup error into a system so that the administrator of  
the system is denied service when trying to use a browser-based UI for  
managing the system (and all other users are denied service, too).

> Hardly a compelling set of evidence that well-formed XML is too hard  
> for ordinary mortals.

So far Philip Taylor (the author of http://lists.w3.org/Archives/Public/www-archive/2009Feb/0058.html 
  ) has found well-formedness holes in every XML-outputting system he  
has cared to try.

He even managed to make Validator.nu produce ill-formed output. The  
bug was in the Xalan serializer--a widely distributed library written  
by experts. (Astral characters were serialized as two numeric  
character references for the corresponding surrogates.)

I can brag that Philip hasn't found an ill-formedness-inducing bug in  
any XML serialization code written entirely by me. However, he has  
still found *a* bug (not ill-formedness-inducing one) in my XML  
serializer, too. (I replaced the Xalan serializer with one that I  
wrote myself.)

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/
Received on Wednesday, 11 February 2009 09:35:45 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 26 April 2012 12:48:12 GMT