Re: Draft

On 2/20/2012 8:17 PM, Noah Mendelsohn wrote:
>
>
> I don't think so. I think we want to distinguish content that is 
> correct or preferred from that which is tolerated. For the moment, I 
> would assume that the "correct" content is well-formed XML. We might 
> loosen that a bit to include some additional constructs like unquoted 
> attributes, or perhaps names that use other than XML name characters. 
> In general, though, I think we do want to identify a class of correct 
> input, and I think that will be very close in spirit, if not 
> necessarily in all details, to XML.

I don't want to presuppose any solution here.  Surely if the goal is 
that any input, regardless of how broken, is going to produce a tree, 
then anyone is going to be able to create a bad example.  I don' t think 
we really need to worry about whether examples are "good" or not.  I 
would rather focus upon the details of how bad input is predictably 
transformed and let these "bad" chips fall where they may.

<rant>

On the other hand, I actually don't think it is a great idea to 
transform any input, regardless of how broken.  Somethings are just NOT 
XML.  Those things are probably NOT XML-ER either.

For example, the string "The quick brown fox jumped over the lazy dog." 
is NOT XML, and I can't imagine that it is XML-ER either.  It wouldn't 
make any sense to me if the XML-ER rules said that a document consisting 
of that string is transformed into a tree by saying it is a text node 
that is enclosed in an anonymous element node.  I would prefer that an 
XML-ER parser that was handed something really broken fail predictably.  
Encouraging the parsing of stuff that is really broken is how HTML got 
so messed up in the first place.

</rant>

-- 
Shane McCarron
Managing Director, Applied Testing and Technology, Inc.
+1 763 786 8160 x120

Received on Tuesday, 21 February 2012 03:56:10 UTC