Re: error recovery from Noah Mendelsohn on 2012-02-19 (public-xml-er@w3.org from February 2012)

From: Noah Mendelsohn <nrm@arcanedomain.com>
Date: Sun, 19 Feb 2012 14:00:05 -0500
To: liam@w3.org
CC: David Lee <David.Lee@marklogic.com>, Norman Walsh <ndw@nwalsh.com>, W3C XML-ER Community Group <public-xml-er@w3.org>
Message-ID: <4F4146B5.7000509@arcanedomain.com>

On 2/19/2012 1:13 PM, Liam R E Quin wrote:
> Actually the worst case I've encountered in XML is
> <a b:att1="v1" b:att2="v2" ... [a gigabyte of attributes followed by]
>       b:attFFFF="vFFFF" xmlns:b="http://example.org/"  />
>
> You may have to buffer all the attributes until you get to the namespace
> declaration. In practice this isn't really an issue for a Web browser,
> or for anything else constructing a tree, because you have to keep them
> anyway.

Yeah, when we built our high performance XML parse a few years ago, this 
was something we spent a lot of time designing around, albeit as an edge 
case. It's also one of the reasons that things like LALR parsers tend to 
have trouble dealing with XML, as I recall.  A related example is:

<b:a att1="v1" att2="v2" ... [a gigabyte of attributes followed by] 
attFFFF="vFFFF" xmlns:b="http://example.org/"  />

Really screws up naive approaches to streaming the matching of content models.

Noah

Received on Sunday, 19 February 2012 19:00:30 UTC