Re: Some random ideas around (broken) XML from Karl Dubost on 2009-11-18 (www-archive@w3.org from November 2009)

From: Karl Dubost <karl+w3c@la-grange.net>
Date: Wed, 18 Nov 2009 09:01:10 -0500
To: Julian Reschke <julian.reschke@gmx.de>
Cc: www-archive <www-archive@w3.org>
Message-Id: <891AFE32-A3F2-4053-891D-E1C38B3A81DC@la-grange.net>

Le 18 nov. 2009 à 00:17, Julian Reschke a écrit :
> Karl Dubost wrote:
>> ...
>> # PRODUCING BROKEN XML
>> The fact is that many atom feeds are broken for many reasons.
>> * edited by hand
>> * created by templating tools which are not XML producers
>> * mixing content from different sources (html, db, xml) with  
>> different encodings
>> It means when designing an atom feed consumer, implementers are  
>> forced to recover the broken content to be able to make it usable  
>> by the crowd (social impact). Second part of the postel laws "Be  
>> liberal in what you accept".
>> ...
>
> Are you *really* sure about that? My understanding is that there are  
> popular Atom consumers that require proper XML (except for the  
> RFC3023 issue), and that falling back to handle broken XML is  
> actually not needed (opposed to RSS).

for the likes of Technorati, bloglines that would be possible to check  
the exact figures.
I wonder if the MAMA (Opera) has details about that or would be  
willing to search details
http://dev.opera.com/articles/view/mama/

On Thu, 22 Oct 2009 23:02:55 GMT
In Official Google Reader Blog: XML Errors in Feeds
At http://googlereader.blogspot.com/2005/12/xml-errors-in-feeds.html

XML Errors in Feeds
Friday, December 23, 2005 by Mihai Parparita

Dealing with the millions of RSS and Atom feeds
out there is hard work. We're not trying to make
you feel sorry for the Reader team, but as anyone
who has attempted to implement a feed parser
knows, there are many subtle deviations from the
spec that you have to handle if you want to have
any hope of satisfying the needs of your users
(who shouldn't have to care about such

On Tue, 10 Nov 2009 22:47:52 GMT
In XML - Dive Into Python 3
At http://diveintopython3.org/xml.html#xml-custom-parser

Some people (myself included) believe that it was
a mistake for the inventors of XML to mandate
draconian error handling. Don’t get me wrong; I
can certainly see the allure of simplifying the
error handling rules. But in practice, the concept
of “wellformedness” is trickier than it sounds,
especially for XML documents (like Atom feeds)
that are published on the web and served over
HTTP. Despite the maturity of XML, which
standardized on draconian error handling in 1997,
surveys continually show a significant fraction of
Atom feeds on the web are plagued with
wellformedness errors.

Universal Feed Parser
http://www.feedparser.org/

-- 
Karl Dubost
Montréal, QC, Canada
http://www.la-grange.net/karl/

Received on Wednesday, 18 November 2009 14:01:14 UTC