- From: Sam Ruby <rubys@intertwingly.net>
- Date: Thu, 06 Jan 2011 17:01:36 -0500
- To: Anne van Kesteren <annevk@opera.com>
- CC: Henri Sivonen <hsivonen@iki.fi>, public-html-xml@w3.org
On 01/06/2011 04:18 PM, Anne van Kesteren wrote: > On Thu, 06 Jan 2011 21:45:40 +0100, Sam Ruby <rubys@intertwingly.net> > wrote: >> On 01/06/2011 02:27 PM, Anne van Kesteren wrote: >>> Isn't one of the problems with RSS that you do not know whether it is >>> HTML or XML? E.g. what "&gt;" means? I am not sure how we can solve >>> that here. >> >> RSS 2.0 has many problems. Many of them outside the scope of this task >> force. The existence of problems outside of the scope of this task >> force doesn't make the problems that do affect the topics that this >> task force is intended to address. > > So what is an example of an RSS document this task force could do > something about? I assert that from time to time one will come across a document fragment which has become disassociated from its media type. I provided as an example of this: the rss 2.0 description element. Henri asked if Atom solves this. While it is correct that Atom provides a means to identify such content unambiguously, I further assert is that we can't assume that either RSS 2.0 is going away or that RSS 2.0 will be corrected in any reasonable period time. >>>> As long as we have both application/xhtml+xml and text/html, we will >>>> always have at least two ways to interpret documents. The two possible >>>> strategies for mitigating this would be to either minimize or maximize >>>> the set of documents which can be successfully parsed as either. >>>> >>>> Given that HTML5 doesn't make a practice of rejecting any input, only >>>> one of those two paths is viable. >>> >>> I would not mind changing XML. >> >> I'm not sure why you are bringing this up in this context. > > I read your statement as XML being the limiting factor as it rejects way > more input. So to maximize the set of documents which can be > successfully parsed as either (i.e. no rejection happening) we would > have to change XML. > >> Would you suggest changing XML in a way that reduces this down to one >> path? In particular, how would the XML that you envision parse the >> following fragment? >> >> <rss version="2.0"> >> <channel> >> <title>Scripting News</title> >> <link>http://scripting.com/</link> >> >> I mention this as we recently discussed how HTML5 parses link tags: >> >> http://lists.w3.org/Archives/Public/public-html-xml/2011Jan/0107.html > > Per XML5 rules. Changing XML in such a way would NOT reduce this down to one path. For reference: $ python test.py '<div xmlns="http://www.w3.org/1999/xhtml"><para>This is some<link>text</link></para></div>' #document | <div> (, div, http://www.w3.org/1999/xhtml) | xmlns="http://www.w3.org/1999/xhtml" (, xmlns, http://www.w3.org/2000/xmlns/) | <para> (, para, http://www.w3.org/1999/xhtml) | "This is some" | <link> (, link, http://www.w3.org/1999/xhtml) | "text" The only way that adopting that would reduce this down to one path is if html5 were also changed in a way that would break the web. - Sam Ruby
Received on Thursday, 6 January 2011 22:16:36 UTC