Re: Should we say anything on security? from Liam R E Quin on 2012-09-12 (public-microxml@w3.org from September 2012)

From: Liam R E Quin <liam@w3.org>
Date: Wed, 12 Sep 2012 01:37:47 -0400
To: James Clark <jjc@jclark.com>
Cc: public-microxml@w3.org
Message-ID: <1347428267.23437.107.camel@localhost.localdomain>

On Wed, 2012-09-12 at 11:18 +0700, James Clark wrote:
> On Tue, Sep 11, 2012 at 8:25 PM, Uche Ogbuji <uche@ogbuji.net> wrote:
> 
> > MicroXML happens to close some of the more notorious security holes
> > associated with XML: the billion laughs attack from external entity
> > processing and CDATA injection.  Is it worth making a statement in the spec
> > that we believe its simplifications also improve security in dealing with
> > MicroXML on networks?
> >
> 
> I think that's worth saying in the intro as part of describing the value
> proposition of MicroXML.

+1


>  We can also say that another factor that may make
> it more suitable for protocols is that it allows you to follow the
> long-standing IETF tradition of being liberal in what you accept.

I'm reluctant there. XML doesn't forbid error recovery either - it only
forbids *silent* error recovery. If a document isn't XML you can't claim
it's XML, but you can turn it into XML and process the result.

> I would suggest the following points:
> 
> 1. We use UTF-8 so the security considerations of RFC 3628 apply

Augmented slightly perhaps by "obfuscatory" character references.

> 2. We should say something about the applicability of XML Digital
> Signatures to MicroXML.
> a) You need to use a MicroXML parser not an XML parser to construct the XML
> DSig data model, because newlines in attribute values aren't normalized in
> MicroXML

Is this really enough of a reason to abandon the XML parser? Well,
that's maybe a separate discussion.

> b) The XML C14N of a MicroXML document will be XML but may not be MicroXML,
> because MicroXML requires > to be quoted in attribute values.  This
> typically doesn't matter, because the only thing you usually do to the C14N
> output is feed it into a hash algorithm
> 
> 3. I think it's worth pointing out that you can construct documents that
> cause even a streaming parser to use memory proportional to the size of the
> input by using large attribute values, large attribute/element names, large
> numbers of attributes or deep element nesting. So in resource-constrained
> security-sensitive situations parsers may want to put hard limits on such
> things to reduce the possibility of DoS attacks.

Yes. µXML at least doesn't suffer the problem that you can't in general
process attributes on an element until you've seen them all, in case the
very last one is a namespace declaration...

> RFC 1874 has this:
> 
> SGML entities contain information to be parsed and processed by the
> > recipient's SGML system. Those entities may contain and such systems may
> > permit explicit system level commands to be execute while processing the
> > data. To the extent that an SGML system will execute arbitrary command
> > strings recipients of SGML entities may be at risk.
> 
> 
> and RFC3023 says essentially the same thing.  I don't find this
> particularly helpful.

English understatement? :-)

The NDATA entity examples of SGML (or were they just in the handbook? I
forget) did suggest this behaviour, to be fair.

Liam

-- 
Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/
Pictures from old books: http://fromoldbooks.org/
Co-author, 5th edition of "Beginning XML", Wrox, July 2012

Received on Wednesday, 12 September 2012 05:38:16 UTC