- From: Geoffrey Sneddon <foolistbar@googlemail.com>
- Date: Wed, 11 Feb 2009 15:49:49 +0000
- To: Julian Reschke <julian.reschke@gmx.de>
- Cc: Anne van Kesteren <annevk@opera.com>, www-tag@w3.org
To speak as I so rarely do on W3C lists, as a feed parsing library vendor: On 11 Feb 2009, at 11:01, Julian Reschke wrote: > As far as I can tell, the Atom feed format gets away with draconian > error handling (minus the RFC3023 thingy) quite well. I wouldn't say that is entirely true: Atom, as well as RFC3023 issues, hits the character decoding restriction, i.e., > It is a fatal error if an XML entity is determined (via default, > encoding declaration, or higher-level protocol) to be in a certain > encoding but contains byte sequences that are not legal in that > encoding. It is fairly common for feeds to contain invalid byte sequences. Also, note that until work on Acid3 got underway, the only major browser to implement this was IE. Finally, the only other issue I can think of is that something along the lines of <http://www.w3.org/TR/2008/WD-html5-20080610/parsing.html#character0 > is also needed for compatibility with the real web. The issues with RSS aren't much greater: mainly just more people having HTML entities within the XML output, and a small number of cases of bogus content beyond the end of the document. Revising RFC 32023, changing the requirement that any encoding error is fatal, and defining how to compare encoding names (perhaps even as leniently as Unicode TR22 defines) with changing some to others (such as ISO-8859-1 to Windows-1252) would make it far more possible for feed readers to comply with the relevant specifications. Until that happens, I'll be amazed if any major feed readers fully comply with them. -- Geoffrey Sneddon <http://gsnedders.com/> <http://simplepie.org/>
Received on Wednesday, 11 February 2009 15:55:33 UTC