Re: Limiting the size of the @charset byte sequence

On Tue, Jan 28, 2014 at 1:31 PM, Bjoern Hoehrmann <derhoermi@gmx.net> wrote:
> Arbitrary limits are bad design and often harder to implement correctly
> than something without arbitrary limits.

I agree with this attitude in general, but in this case, an arbitrary
limit is absolutely required and I was shocked to discover that we
didn't already have one.

When CSS is delivered over the network as a discrete resource (instead
of being embedded in a larger document), UAs need to be able to decide
on the encoding before the entire resource has been delivered, so that
they can begin parsing the style sheet as quickly as possible.  (If
you are about to quibble with that presupposition, be aware that
complex webapps may involve tens of megabytes of machine-generated
CSS.)  When the encoding directive is in-band, that involves chopping
off the first N bytes of the document and handing it to the special
@charset parser.  If the standard does not specify an exact value for
N, UAs may disagree on the interpretation of style sheets, and worse,
may be inconsistent *with themselves* based on network latency; i.e.
encoding directives too deep into the document might be honored on one
page load, not honored on the next, just because the second packet of
the HTTP response took too long to arrive on the second load.  I am
not aware of this having been an actual problem for CSS, but it
definitely was for HTML, and that is where Henri is coming from (he
wrote Gecko's current HTML parser).

Your comments about DFAs, etc. miss the mark because the limit is
being imposed by the network layer, not the parser.  The
implementation is something like

   on network receive {
       append current packet to buffer
       if (len(buffer) > 1024) {
           invoke @charset parser on first 1024 bytes
           begin streaming data to full parser
       }
   }

zw

Received on Wednesday, 29 January 2014 23:09:07 UTC