- From: Geoffrey Sneddon <gsneddon@opera.com>
- Date: Wed, 18 Nov 2009 11:30:24 +0100
- To: Julian Reschke <julian.reschke@gmx.de>
- CC: Karl Dubost <karl+w3c@la-grange.net>, www-archive <www-archive@w3.org>
Julian Reschke wrote: > Karl Dubost wrote: >> ... # PRODUCING BROKEN XML >> >> The fact is that many atom feeds are broken for many reasons. >> >> * edited by hand * created by templating tools which are not XML >> producers * mixing content from different sources (html, db, xml) >> with different encodings >> >> It means when designing an atom feed consumer, implementers are >> forced to recover the broken content to be able to make it usable >> by the crowd (social impact). Second part of the postel laws "Be >> liberal in what you accept". ... > > Are you *really* sure about that? My understanding is that there are > popular Atom consumers that require proper XML (except for the > RFC3023 issue), and that falling back to handle broken XML is > actually not needed (opposed to RSS). Almost all violate (as it is needed for compatibility): > It is a fatal error if an XML entity is determined (via default, > encoding declaration, or higher-level protocol) to be in a certain > encoding but contains byte sequences that are not legal in that > encoding. Quite a lot of feed readers use identical processors for both Atom and RSS though, and I imagine that a lot don't want to have one processor for each, so if you really want to be strict for Atom you probably have to convince people that it is in their interest to be strict for RSS (and for any commercial product, I expect the cost of poorer compatibility is greater than that gained by being strict). Probably the only thing really needed for RSS but not needed for Atom is predefined entities (that were present in RSS 0.91 (Netscape)), which arguably should be solved just by increasing the number of predefined entities in XML. Out of incidental interest, I did try shipping a release of SimplePie (which, combined with downstream users, has millions of users) which was strict with character encodings, but that turned out quite quickly to be unworkable in the real web. It, to this day, is strict with entities, and that causes around one bug report/support issue per month. I have plenty of occasions been tempted to prefix all documents with a DOCTYPE containing the entities present in RSS 0.91 (Netscape), though always found some technical reason to not implement it due to implementation complexity. -- Geoffrey Sneddon — Opera Software <http://gsnedders.com/> <http://www.opera.com/>
Received on Wednesday, 18 November 2009 10:31:06 UTC