Re: HTML interpreter vs. HTML user agent from Anne van Kesteren on 2009-05-29 (public-html@w3.org from May 2009)

From: Anne van Kesteren <annevk@opera.com>
Date: Fri, 29 May 2009 12:05:32 +0200
To: "Henri Sivonen" <hsivonen@iki.fi>, "Sam Ruby" <rubys@intertwingly.net>
Cc: "HTML WG" <public-html@w3.org>
Message-ID: <op.uuornikq64w2qv@anne-van-kesterens-macbook.local>

On Thu, 28 May 2009 15:42:56 +0200, Henri Sivonen <hsivonen@iki.fi> wrote:
> On May 28, 2009, at 16:15, Sam Ruby wrote:
>> Anybody care to identify any more specifics?
>
> My understanding is that search engines that process massive amounts of  
> data may want to do so with a streaming parser that doesn't abort on  
> errors for which compliant recovery isn't streamable. It seems possible  
> to perform indexing usefully without complying with the spec in the  
> non-streamable cases.
>
> I don't have first-hand experience of working on a search engine, I'm  
> not sure how much of a concern full streamability actually is, and I'm  
> not sure if it's worthwhile to address this case in the spec.
>
> (It's inconceivable to expect browsers to switch to streamable recovery,  
> so that's not an option.)

Yeah, I recall this being discussed on IRC at some point.

I think it was also discussed to actually define what exactly streaming  
APIs would have to do that do not have some tree-like representation and  
do not want to abort on errors for which a tree-like representation is  
required to "recover".

Such an algorithm could also be useful for highly optimized data  
extraction. E.g. <title> / <link> / <meta> etc.

-- 
Anne van Kesteren
http://annevankesteren.nl/

Received on Friday, 29 May 2009 10:06:25 UTC