Re: Some random ideas around (broken) XML from Julian Reschke on 2009-11-18 (www-archive@w3.org from November 2009)

From: Julian Reschke <julian.reschke@gmx.de>
Date: Wed, 18 Nov 2009 15:15:53 +0100
To: Karl Dubost <karl+w3c@la-grange.net>
CC: www-archive <www-archive@w3.org>
Message-ID: <4B040199.2020700@gmx.de>

Karl Dubost wrote:
> ...
> On Tue, 10 Nov 2009 22:47:52 GMT
> In XML - Dive Into Python 3
> At http://diveintopython3.org/xml.html#xml-custom-parser
> 
> Some people (myself included) believe that it was
> a mistake for the inventors of XML to mandate
> draconian error handling. Don’t get me wrong; I
> can certainly see the allure of simplifying the
> error handling rules. But in practice, the conceptS
> of “wellformedness” is trickier than it sounds,
> especially for XML documents (like Atom feeds)
> that are published on the web and served over
> HTTP. Despite the maturity of XML, which
> standardized on draconian error handling in 1997,
> surveys continually show a significant fraction of
> Atom feeds on the web are plagued with
> wellformedness errors.
> 
> 
> Universal Feed Parser
> http://www.feedparser.org/
> ...

The Universal Feed Parser is part of the problem. As far as I recall, 
the author was proposing non-draconian Atom parsing even before the Atom 
spec was done.

So what I'd like to see is data about the *current* state of *Atom* 
feeds, not RSS. My understanding (see also Sam's comment) is that their 
are several popular consumers getting away with using proper XML parsers 
(except for the RFC3023 issue), which would indicate that the *actual* 
percentage of broken content is smaller than some people think it is.

BR, Julian

Received on Wednesday, 18 November 2009 14:16:38 UTC