Re: Support Existing Content from Anne van Kesteren on 2007-05-04 (public-html@w3.org from May 2007)

From: Anne van Kesteren <annevk@opera.com>
Date: Fri, 04 May 2007 12:32:08 +0200
To: "Julian Reschke" <julian.reschke@gmx.de>, "Maciej Stachowiak" <mjs@apple.com>
Cc: "Gareth Hay" <gazhay@gmail.com>, matt@builtfromsource.com, public-html@w3.org
Message-ID: <op.trssvux664w2qv@id-c0020>

On Fri, 04 May 2007 12:24:49 +0200, Julian Reschke <julian.reschke@gmx.de>  
wrote:
>> OK, but what's the actual harm of doing so? Can you describe it in  
>> words? You've said repeatedly that you think nonconforming content is  
>> really bad, but you haven't once explained how its existence hurts  
>> anyone, or how wiping it out would help anyone.
>> ...
>
> It hurts those who want to parse HTML, but do not want to implement a  
> full user agent (think metadata extraction, microformats, crawling,  
> indexing...).

A small group of people (including myself) created a small library for  
Python in Python to do just that: http://code.google.com/p/html5lib/

Should be pretty trivial to port it to other languages. Parsing HTML5 is  
one of the least complicated parts of the specification.


> Now I understand that what's the well-defined HTML5 parsing is for... So  
> this sort of proves to me that the distinction between "conformant" and  
> "parseable" documents really is meaningless.


-- 
Anne van Kesteren
<http://annevankesteren.nl/>
<http://www.opera.com/>

Received on Friday, 4 May 2007 10:32:28 UTC