- From: David Carlisle <davidc@nag.co.uk>
- Date: Tue, 21 Feb 2012 10:29:35 +0000
- To: public-xml-er@w3.org
On 21/02/2012 02:17, Noah Mendelsohn wrote: > > > On 2/20/2012 8:22 PM, David Carlisle wrote: >> I agree that the input shouldn't be described as "XML" but it >> needn't purport to be XML either. If I choose to parse >> "<foo>a</bar>" with this parser I don't need to (or get the >> document to ) purport that is XML, I just want an XML-compatible >> result so I can bash it with XSLT (typically) > > Are you sure you want to do that with your example. Yes I think so, or more exactly I don't want the requirements drafted in such a way that prevents us deciding we want that. > It's really not clear what a user intended here. Most likely XML-ER > will produce some tree out of this input, but if the author intended > anything like what we know as XML, the results of any fixup have at > least a 50/50 chance of not being "correct" (did the user mean a > "foo" element, a "bar" element, or something else. I think thinking of it as fixup doesn't really work. If I _choose_ to use this parser rather than an XML one then I'm asserting that I want to get a DOM (or XDM or whatever) tree out of some input. I don't intend to edit or in any way fix the input to being XML. I'd view it the same way as taking a SAX parser for GEDCOM (from Michael's XSLT book) or CSV or JSON. You parse the input, which needn't look like XML at all, get an XML compatible parse tree and then following applications work with it as if it were XML. Asking at what point in the file the JSON input wasn't well formed XML isn't very useful. I know I'm over stating the point as xml-er "looks like" xml and has the requirement that if it happens to _be_ xml then parsing with xml-er or xml should have the same result, but I think comparing it to using a non-xml parser is a more useful idiom than comparing it to a syntactic fixup followed by an xml parse. > > Of course, once the XML-ER spec is written, there will be some > answer. Let's say the answer it gives is to assume that the </bar> > was meant to be a </foo>. No it won't say that </bar> was meant to be <foo> any more than the XML 1.0 spec says that <foo a = 'b' > was meant to be <foo a="b"> The grammar of XML 1.0 just results in those two things being equivalent, it doesn't need to make a judgement about which is more correct or that one is changed into the other. An xml-er parser (might) make <foo a=b> have the same result, but again we don't need to use language that implies that <foo a=b> is "fixed" in any way. > OK, do we really want to tell users to > write <foo>a</bar> as a first class way of getting a <foo> element? > > I don't think so. I think we want to distinguish content that is > correct or preferred from that which is tolerated. For the moment, I > would assume that the "correct" content is well-formed XML. Some things we can agree are flagged as parse errors for xml-er (and my mis-matched end tag would be so flagged in Anne's draft) but the list of things that are not flagged might end up being very long and so in the end thinking that an xml-er document that does not generate a pare error will be well formed XML will be (or might be) far from the truth. > We might loosen that a bit to include some additional constructs like > unquoted attributes, or perhaps names that use other than XML name > characters. In general, though, I think we do want to identify a > class of correct input, and I think that will be very close in > spirit, if not necessarily in all details, to XML. > > Noah > > If we define things such that every xml-er document that does not generate a parse error is well formed xml then you can mechanically pass such a document (unparsed) into an xml pipeline. If there are _any_ cases where this is not the case then you can not, and so I don't personally feel it is particularly useful to know that if there are no xml-er parse errors it is "almost" xml. I think as Shane said we should just define xml-er parsing in a way that makes sense on that context and then just see at the end how far it differs from XML given non well formed input. David ________________________________________________________________________ The Numerical Algorithms Group Ltd is a company registered in England and Wales with company number 1249803. The registered office is: Wilkinson House, Jordan Hill Road, Oxford OX2 8DR, United Kingdom. This e-mail has been scanned for all viruses by Star. The service is powered by MessageLabs. ________________________________________________________________________
Received on Tuesday, 21 February 2012 10:30:00 UTC