- From: Dave Pawson <dave.pawson@gmail.com>
- Date: Tue, 4 Jan 2022 07:49:29 +0000
- To: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>
- Cc: Steven Pemberton <steven.pemberton@cwi.nl>, ixml <public-ixml@w3.org>
On Mon, 3 Jan 2022 at 15:57, C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com> wrote: > >> > >> On Monday 03 January 2022 11:19:25 (+01:00), Dave Pawson wrote: > >> > >>> On Mon, 3 Jan 2022 at 10:04, Steven Pemberton <steven.pemberton@cwi.nl> > >> wrote: > >>>> Output, on the other hand, is a different issue, because of the XML > >>>> misfeature of excluding most C0 characters from content. (If I were > >>>> redesigning XML, I would allow those characters, but only expressed in > >>>> encoded form.) > >>>> > >>>> So, it is just fine to accept XML illegal characters in the input, as > >> long as they don't end up in the output: > > > > Then what are you saying above? > > I provide C0 char in, "it doesn't end up in the output" > > IMHO that is modifying my data as given to the application? I still assert you (Michael) are saying the same thing? > > Let me try phrasing it differently. > So the mapping from non-XML data to XML data cannot always > succeed. What happens when it doesn’t? Is it OK? Is it an error > in the processor? Is it an error in the grammar? Is it an error in the > input? It doesn’t fit the short description of ixml, because we had > input and a grammar and we did not get XML out at the other end. > The spec needs a story of some kind. What should that story be? > > The position Steven is suggesting is (as I understand it): ... > > - In order to ensure that the output is in fact XML, the grammar must > see to it that any non-XML characters in the input do not get > written out as data in an XML document. The obvious way to do > this is to mark the relevant terminals as hidden, as in Steven’s example > > -[#0 - #1F] And if the user a) has such input contained within the input and b) has no such rule in the grammar? > > There may be other ways to write the grammar so as to ensure that > a U+0001 in the input does not end up making it impossible for the > processor to produce XML output, although I cannot think of any off hand. Thanks Michael, at least I understand the mechanism (and that owness rests simply on the user!) > - Steven’s remark "And assuring those characters don't get through to > the output is the grammar author's responsibility” leads to a story in > which an attempt to write out a non-XML character in ixml output is > an error in the grammar. Possibly, like other cases that have been > brought up, it’s what I would call a “run-time error in the grammar” — > that is, an error in the grammar that may be caught only for some > inputs, and which a processor is not obligated to detect in other > cases. Either way, the user has made an error and surely must be told of it. > > It might be nicer to require the processor to detect the error regardless > of the input, but it might be very tricky to analyse a grammar and prove that > no possible input would ever cause an attempt to write a non-XML > character to the output. I would not swear that there is not a theorem > proving that it cannot be done, or that it’s equivalent to the Halting > Problem. All I know is that it doesn’t look easy. This bit I don't understand. The grammar can tell you to map C0 chars to (nothing? omitted from the output?) something, yet you couldn't spot them otherwise? Is this what the para above says? > > So: Steven is not proposing that input containing U+0001 be > illegal, nor that it be modified silently to change the character to > something else. He is observing that the grammar writer already > has the responsibility of saying what parts of the input get written > out to the XML output and is thus in a position to write a grammar > that ensures that non-XML characters do not appear in the output. (Now) understood... I'd put a pound to a penny that many other potential users will fall foul of this aspect of ixml. [Sorry Steven if I misinterpreted your initial comment] > > Those things could of course be proposed — you did propose, > if I understood you correctly, that ixml just specify that all inputs > have to be streams of XML characters, and I think that would make > life simpler for me as an implementor. No one that I know of has > proposed that non-XML characters in the input be legal but > silently changed to something else. It was my misinterpretation. I'm wondering why Norm insists this is a good thing? What is a user going to do with an ... invalid? Non-well-formed XML file produced when the he/she omits one from the grammar and her customer includes one in an input file? Chaos Norm? How many xml parsers can point to character n as being bad in this way? > I hope this helps. > > Michael Yes, thanks Michael. I still prefer the simplicity of XML valid input constraints (if that can be checked) regards -- Dave Pawson XSLT XSL-FO FAQ. Docbook FAQ.
Received on Tuesday, 4 January 2022 07:50:53 UTC