- From: Dave Pawson <dave.pawson@gmail.com>
- Date: Mon, 3 Jan 2022 16:40:36 +0000
- To: Norm Tovey-Walsh <norm@saxonica.com>
- Cc: Steven Pemberton <steven.pemberton@cwi.nl>, "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>, ixml <public-ixml@w3.org>
On Mon, 3 Jan 2022 at 16:31, Norm Tovey-Walsh <norm@saxonica.com> wrote: > > Then what are you saying above? > > I provide C0 char in, "it doesn't end up in the output" > > IMHO that is modifying my data as given to the application? > > But modifying data is what ixml *is for*. I think we have a different interpretation of modifying? > > You write a grammar that translates some non-XML format into XML. Along > the way, you decide what items in the non-XML format get turned into > attributes, what items get turned into elements, what items get output > as characters, and what items get omitted. Agreed. But changing Fred into Jane (I think) is not part of the bargain? OK, I'm being more crude than you, but I hope you can see my objection? > > All Steven is saying is that if you write a grammar that accepts input > that contains C0 control characters, you better make sure all the C0 > control charactesr get omitted if you’re going to make XML at the end of > the day. Which is at the heart of my objection. > > Consider this grammar for amounts of money in GBP (written on the fly > and untested, YMMV): > > cost: "£"? digit+ ("." digit+)? . > -digit: ["0"-"9"] . > > If you parse “£1234.56” with that grammar, you get > > <cost>£1234.56</cost> No problem, you've not messed with my input data. > > Suppose for the sake of argument that “£” was not a valid XML character. > Then that XML output would be invalid. And that would be because *you* > wrote a grammar that generated something invalid! Halt reset and load. I *want* this to be wrong | in error | prohibited? i.e. in the 'should not happen' (or reported as an error etc) > > You could instead have written the grammar like this: > > cost: -"£"? digit+ ("." digit+)? . > -digit: ["0"-"9"] . > > And then you’d get > > <cost>1234.56</cost> > > That logic applies for all characters (actually) not valid in XML. > > Does that help? No Norm, because it seems you're accepting the spec, with non-XML characters as 'good to go'. I'm saying it needs changing. If I'm interpreting Michaels comments correctly, it would make your job easier too? regards -- Dave Pawson XSLT XSL-FO FAQ. Docbook FAQ.
Received on Monday, 3 January 2022 16:41:00 UTC