Re: Safe ways of implementing limits on buffer sizes in the parser from Henri Sivonen on 2009-07-02 (public-html@w3.org from July 2009)

From: Henri Sivonen <hsivonen@iki.fi>
Date: Thu, 2 Jul 2009 11:30:58 +0300
To: Ian Hickson <ian@hixie.ch>
Cc: "public-html@w3.org WG" <public-html@w3.org>
Message-Id: <019998D6-76F2-4B55-8086-8626DDBEA8AA@iki.fi>

On Jul 2, 2009, at 08:18, Ian Hickson wrote:

> On Mon, 8 Jun 2009, Henri Sivonen wrote:
>>
>> The spec allows implementations to place limits on the sizes of  
>> various things
>> in HTML in order to avoid exhausting resources.
>>
>> There are various buffers in the HTML5 parser all of which a remote  
>> site can
>> fill arbitrarily much by choosing a suitable input. Has someone  
>> already
>> pondered the security implications of the following strategies?  
>> That is, are
>> either of these safe?
>>
>> 1) Truncating a buffer from the end and leaving U+FFFD as the last  
>> character
>> in the buffer.
>>
>> 1) Truncating a buffer from the beginning and leaving U+FFFD as the  
>> first
>> character in the buffer.
>>
>> (It seems that dropping the buffer entirely is inconvenient e.g.  
>> when the
>> buffer is an element name, although I guess it's an option for  
>> attribute
>> values and element content.)
>
> Both options seem reasonable; personally I implemented the former

The latter would be make the implementation hit the buffer full  
condition less often, since it could simply empty the buffer and put  
the truncation marker at the start of the buffer.

> (though
> if I recall correctly, I used "... truncated", with a space, rather  
> than
> U+FFFD, since that was it couldn't clash with a non-truncated  
> attribute).

Putting a space in there scares me, because it would split the token  
when serializing and reparsing. I can't think of a concrete attack,  
though, and using any truncation marker that doesn't start with an  
ASCII letter at the start of the buffer would cause radically  
different reparsing anyway.

Maybe the truncation marker could safely be the three-character string  
ellipsis, space, U+FFFD. (I really don't want anything localization- 
sensitive such as the word "truncated" in there.)

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/

Received on Thursday, 2 July 2009 08:31:40 UTC