W3C home > Mailing lists > Public > www-tag@w3.org > February 2009

Re: HTML and XML

From: Robert J Burns <rob@robburns.com>
Date: Wed, 11 Feb 2009 22:56:39 -0600
Message-Id: <23B1CC72-F2F8-4012-93B7-9F06042DDBDB@robburns.com>
To: www-tag@w3.org
Cc: www-archive <www-archive@w3.org>

Hi Henri,

Henri Sivonen wrote:
> On Feb 10, 2009, at 22:26, Henry S. Thompson wrote:
>
> > And there's good reason for that:  XML actually _is_ usable by
> > authors and authoring well-formed XML is _not_ hard.
>
> However, writing XML-outputting software whose output is always well-
> formed even in the case of malicious input is hard.
>
> > b) points to a piece of broken _software_;
> [..]
> > one article that points to a page in which someone trying to
> > introduce an _intentional_ markup error made the wrong error.
> > And there's good reason for that:  XML actually _is_ usable by
> > authors and authoring well-formed XML is _not_ hard.
>
> However, writing XML-outputting software whose output is always well-
> formed even in the case of malicious input is hard.
>
> > b) points to a piece of broken _software_;
> [..]
> > one article that points to a page in which someone trying to
> > introduce an _intentional_ markup error made the wrong error.
>
> It is a pretty significant problem if an attacker can intentionally
> introduce a markup error into a system so that the administrator of
> the system is denied service when trying to use a browser-based UI for
> managing the system (and all other users are denied service, too).
>
> > Hardly a compelling set of evidence that well-formed XML is too hard
> > for ordinary mortals.
>
> So far Philip Taylor (the author of http://lists.w3.org/Archives/Public/www-archive/2009Feb/0058.html
>   ) has found well-formedness holes in every XML-outputting system he
> has cared to try.
>
> He even managed to make Validator.nu produce ill-formed output. The
> bug was in the Xalan serializer--a widely distributed library written
> by experts. (Astral characters were serialized as two numeric
> character references for the corresponding surrogates.)

I have to say this is severely overstating things. It is not clear  
from the XML recommendation that such surrogate pairs are not  
permitted. Several of the XML parsers I'm familiar with support that.  
I wouldn't be surprised to hear about others that do not support that  
and even trigger fatal errors, but could you point out some.

The XML recommendation says:

> Well-formedness constraint: Legal Character
> Characters referred to using character references must match the  
> production for Char.

While this clearly means that &#x000; is not permitted it would be  
fair to say that a pair of character references that were a valid pair  
of surrogates would match the production for Char. If the  
recommendation instead said "A character referred to using a character  
reference must match the production for Char., then you would have a  
stronger case. If this is indeed the only error you've found, then I  
would say you haven't yet found an error. Or do you have a different  
part of the recommendation you're reading that makes this a well- 
formedness error?

Take care,
Rob
Received on Thursday, 12 February 2009 04:57:20 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 26 April 2012 12:48:12 GMT