Re: Use cases from Sam Ruby on 2011-01-06 (public-html-xml@w3.org from January 2011)

From: Sam Ruby <rubys@intertwingly.net>
Date: Thu, 06 Jan 2011 17:01:36 -0500
To: Anne van Kesteren <annevk@opera.com>
CC: Henri Sivonen <hsivonen@iki.fi>, public-html-xml@w3.org
Message-ID: <4D263BC0.4000606@intertwingly.net>

On 01/06/2011 04:18 PM, Anne van Kesteren wrote:
> On Thu, 06 Jan 2011 21:45:40 +0100, Sam Ruby <rubys@intertwingly.net>
> wrote:
>> On 01/06/2011 02:27 PM, Anne van Kesteren wrote:
>>> Isn't one of the problems with RSS that you do not know whether it is
>>> HTML or XML? E.g. what "&amp;gt;" means? I am not sure how we can solve
>>> that here.
>>
>> RSS 2.0 has many problems. Many of them outside the scope of this task
>> force. The existence of problems outside of the scope of this task
>> force doesn't make the problems that do affect the topics that this
>> task force is intended to address.
>
> So what is an example of an RSS document this task force could do
> something about?

I assert that from time to time one will come across a document fragment 
which has become disassociated from its media type.  I provided as an 
example of this: the rss 2.0 description element.  Henri asked if Atom 
solves this.  While it is correct that Atom provides a means to identify 
such content unambiguously, I further assert is that we can't assume 
that either RSS 2.0 is going away or that RSS 2.0 will be corrected in 
any reasonable period time.

>>>> As long as we have both application/xhtml+xml and text/html, we will
>>>> always have at least two ways to interpret documents. The two possible
>>>> strategies for mitigating this would be to either minimize or maximize
>>>> the set of documents which can be successfully parsed as either.
>>>>
>>>> Given that HTML5 doesn't make a practice of rejecting any input, only
>>>> one of those two paths is viable.
>>>
>>> I would not mind changing XML.
>>
>> I'm not sure why you are bringing this up in this context.
>
> I read your statement as XML being the limiting factor as it rejects way
> more input. So to maximize the set of documents which can be
> successfully parsed as either (i.e. no rejection happening) we would
> have to change XML.
>
>> Would you suggest changing XML in a way that reduces this down to one
>> path? In particular, how would the XML that you envision parse the
>> following fragment?
>>
>> <rss version="2.0">
>> <channel>
>> <title>Scripting News</title>
>> <link>http://scripting.com/</link>
>>
>> I mention this as we recently discussed how HTML5 parses link tags:
>>
>> http://lists.w3.org/Archives/Public/public-html-xml/2011Jan/0107.html
>
> Per XML5 rules.

Changing XML in such a way would NOT reduce this down to one path.

For reference:

   $ python test.py '<div 
xmlns="http://www.w3.org/1999/xhtml"><para>This is 
some<link>text</link></para></div>'
   #document
   | <div> (, div, http://www.w3.org/1999/xhtml)
   |   xmlns="http://www.w3.org/1999/xhtml" (, xmlns, 
http://www.w3.org/2000/xmlns/)
   |   <para> (, para, http://www.w3.org/1999/xhtml)
   |     "This is some"
   |     <link> (, link, http://www.w3.org/1999/xhtml)
   |       "text"

The only way that adopting that would reduce this down to one path is if 
html5 were also changed in a way that would break the web.

- Sam Ruby

Received on Thursday, 6 January 2011 22:16:36 UTC