Re: HTML and XML

On Feb 10, 2009, at 22:26, Henry S. Thompson wrote:

> And there's good reason for that:  XML actually _is_ usable by
> authors and authoring well-formed XML is _not_ hard.

However, writing XML-outputting software whose output is always well- 
formed even in the case of malicious input is hard.

> b) points to a piece of broken _software_;
> one article that points to a page in which someone trying to  
> introduce an _intentional_ markup error made the wrong error.

It is a pretty significant problem if an attacker can intentionally  
introduce a markup error into a system so that the administrator of  
the system is denied service when trying to use a browser-based UI for  
managing the system (and all other users are denied service, too).

> Hardly a compelling set of evidence that well-formed XML is too hard  
> for ordinary mortals.

So far Philip Taylor (the author of 
  ) has found well-formedness holes in every XML-outputting system he  
has cared to try.

He even managed to make produce ill-formed output. The  
bug was in the Xalan serializer--a widely distributed library written  
by experts. (Astral characters were serialized as two numeric  
character references for the corresponding surrogates.)

I can brag that Philip hasn't found an ill-formedness-inducing bug in  
any XML serialization code written entirely by me. However, he has  
still found *a* bug (not ill-formedness-inducing one) in my XML  
serializer, too. (I replaced the Xalan serializer with one that I  
wrote myself.)

Henri Sivonen

Wednesday, 11 February 2009