Re: HTML interpreter vs. HTML user agent

On Thu, 28 May 2009 15:42:56 +0200, Henri Sivonen <hsivonen@iki.fi> wrote:
> On May 28, 2009, at 16:15, Sam Ruby wrote:
>> Anybody care to identify any more specifics?
>
> My understanding is that search engines that process massive amounts of  
> data may want to do so with a streaming parser that doesn't abort on  
> errors for which compliant recovery isn't streamable. It seems possible  
> to perform indexing usefully without complying with the spec in the  
> non-streamable cases.
>
> I don't have first-hand experience of working on a search engine, I'm  
> not sure how much of a concern full streamability actually is, and I'm  
> not sure if it's worthwhile to address this case in the spec.
>
> (It's inconceivable to expect browsers to switch to streamable recovery,  
> so that's not an option.)

Yeah, I recall this being discussed on IRC at some point.

I think it was also discussed to actually define what exactly streaming  
APIs would have to do that do not have some tree-like representation and  
do not want to abort on errors for which a tree-like representation is  
required to "recover".

Such an algorithm could also be useful for highly optimized data  
extraction. E.g. <title> / <link> / <meta> etc.


-- 
Anne van Kesteren
http://annevankesteren.nl/

Received on Friday, 29 May 2009 10:06:25 UTC