Re: HTML and XML

Henri Sivonen wrote:
> On Feb 10, 2009, at 22:26, Henry S. Thompson wrote:
> 
>> And there's good reason for that:  XML actually _is_ usable by
>> authors and authoring well-formed XML is _not_ hard.
> 
> However, writing XML-outputting software whose output is always 
> well-formed even in the case of malicious input is hard.

Again, not when using the proper tools.

Of course, there are issues with some existing implementations (see 
below), and that the DOM API allows to build documents that can't be 
serialized really doesn't help.

>> b) points to a piece of broken _software_;
> [..]
>> one article that points to a page in which someone trying to introduce 
>> an _intentional_ markup error made the wrong error.
> 
> It is a pretty significant problem if an attacker can intentionally 
> introduce a markup error into a system so that the administrator of the 
> system is denied service when trying to use a browser-based UI for 
> managing the system (and all other users are denied service, too).

I agree with that, but would call that a bug in that system.

>> Hardly a compelling set of evidence that well-formed XML is too hard 
>> for ordinary mortals.
> 
> So far Philip Taylor (the author of 
> http://lists.w3.org/Archives/Public/www-archive/2009Feb/0058.html ) has 
> found well-formedness holes in every XML-outputting system he has cared 
> to try.
> 
> He even managed to make Validator.nu produce ill-formed output. The bug 
> was in the Xalan serializer--a widely distributed library written by 
> experts. (Astral characters were serialized as two numeric character 
> references for the corresponding surrogates.)

And there are similar bugs in the SUN JDK, at least as of 1.4.

> I can brag that Philip hasn't found an ill-formedness-inducing bug in 
> any XML serialization code written entirely by me. However, he has still 
> found *a* bug (not ill-formedness-inducing one) in my XML serializer, 
> too. (I replaced the Xalan serializer with one that I wrote myself.)

I agree that writing a robust serializer isn't trivial (I had to do my 
own as well many years ago to avoid using Xerces).

BR, Julian

Received on Wednesday, 11 February 2009 09:43:32 UTC