Re: Safe ways of implementing limits on buffer sizes in the parser from Ian Hickson on 2009-07-02 (public-html@w3.org from July 2009)

From: Ian Hickson <ian@hixie.ch>
Date: Thu, 2 Jul 2009 19:41:01 +0000 (UTC)
To: Henri Sivonen <hsivonen@iki.fi>
Cc: "public-html@w3.org WG" <public-html@w3.org>
Message-ID: <Pine.LNX.4.62.0907021939350.1053@hixie.dreamhostps.com>

On Thu, 2 Jul 2009, Henri Sivonen wrote:
> 
> > (though if I recall correctly, I used "... truncated", with a space, 
> > rather than U+FFFD, since that was it couldn't clash with a 
> > non-truncated attribute).
> 
> Putting a space in there scares me, because it would split the token 
> when serializing and reparsing. I can't think of a concrete attack, 
> though, and using any truncation marker that doesn't start with an ASCII 
> letter at the start of the buffer would cause radically different 
> reparsing anyway.
> 
> Maybe the truncation marker could safely be the three-character string 
> ellipsis, space, U+FFFD. (I really don't want anything 
> localization-sensitive such as the word "truncated" in there.)

Exactly what an implementation can do more or less depends on how serious 
the limitation is; e.g. if it hits an OOM error then it might be unable to 
do much at all. So I don't want to specify this explicitly in the spec.

In general, all the approaches discussed here are reasonable, IMHO. Spaces 
might be a bit more scary for a Web browser environment than what I was 
doing, so maybe your original ideas are better for that.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Received on Thursday, 2 July 2009 19:41:39 UTC