Re: Draft from Noah Mendelsohn on 2012-02-21 (public-xml-er@w3.org from February 2012)

From: Noah Mendelsohn <nrm@arcanedomain.com>
Date: Mon, 20 Feb 2012 21:17:38 -0500
To: David Carlisle <davidc@nag.co.uk>
CC: public-xml-er@w3.org
Message-ID: <4F42FEC2.2000609@arcanedomain.com>

On 2/20/2012 8:22 PM, David Carlisle wrote:
> I agree that the input shouldn't be described as "XML" but it needn't
> purport to be XML either. If I choose to parse "<foo>a</bar>" with this
> parser I don't need to (or get the document to ) purport that is XML, I
> just want an XML-compatible result so I can bash it with XSLT (typically)

Are you sure you want to do that with your example. It's really not clear 
what a user intended here. Most likely XML-ER will produce some tree out of 
this input, but if the author intended anything like what we know as XML, 
the results of any fixup have at least a 50/50 chance of not being 
"correct" (did the user mean a "foo" element, a "bar" element, or something 
else.

Of course, once the XML-ER spec is written, there will be some answer. 
Let's say the answer it gives is to assume that the </bar> was meant to be 
a </foo>. OK, do we really want to tell users to write <foo>a</bar> as a 
first class way of getting a <foo> element?

I don't think so. I think we want to distinguish content that is correct or 
preferred from that which is tolerated. For the moment, I would assume that 
the "correct" content is well-formed XML. We might loosen that a bit to 
include some additional constructs like unquoted attributes, or perhaps 
names that use other than XML name characters. In general, though, I think 
we do want to identify a class of correct input, and I think that will be 
very close in spirit, if not necessarily in all details, to XML.

Noah

Received on Tuesday, 21 February 2012 02:18:05 UTC