- From: Dave Pawson <dave.pawson@gmail.com>
- Date: Mon, 3 Jan 2022 16:40:36 +0000
- To: Norm Tovey-Walsh <norm@saxonica.com>
- Cc: Steven Pemberton <steven.pemberton@cwi.nl>, "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>, ixml <public-ixml@w3.org>
On Mon, 3 Jan 2022 at 16:31, Norm Tovey-Walsh <norm@saxonica.com> wrote:
> > Then what are you saying above?
> > I provide C0 char in, "it doesn't end up in the output"
> > IMHO that is modifying my data as given to the application?
>
> But modifying data is what ixml *is for*.
I think we have a different interpretation of modifying?
>
> You write a grammar that translates some non-XML format into XML. Along
> the way, you decide what items in the non-XML format get turned into
> attributes, what items get turned into elements, what items get output
> as characters, and what items get omitted.
Agreed. But changing Fred into Jane (I think) is not part of the bargain?
OK, I'm being more crude than you, but I hope you can see my
objection?
>
> All Steven is saying is that if you write a grammar that accepts input
> that contains C0 control characters, you better make sure all the C0
> control charactesr get omitted if you’re going to make XML at the end of
> the day.
Which is at the heart of my objection.
>
> Consider this grammar for amounts of money in GBP (written on the fly
> and untested, YMMV):
>
> cost: "£"? digit+ ("." digit+)? .
> -digit: ["0"-"9"] .
>
> If you parse “£1234.56” with that grammar, you get
>
> <cost>£1234.56</cost>
No problem, you've not messed with my input data.
>
> Suppose for the sake of argument that “£” was not a valid XML character.
> Then that XML output would be invalid. And that would be because *you*
> wrote a grammar that generated something invalid!
Halt reset and load.
I *want* this to be wrong | in error | prohibited?
i.e. in the 'should not happen' (or reported as an error etc)
>
> You could instead have written the grammar like this:
>
> cost: -"£"? digit+ ("." digit+)? .
> -digit: ["0"-"9"] .
>
> And then you’d get
>
> <cost>1234.56</cost>
>
> That logic applies for all characters (actually) not valid in XML.
>
> Does that help?
No Norm, because it seems you're accepting the spec, with non-XML
characters as 'good to go'. I'm saying it needs changing.
If I'm interpreting Michaels comments correctly, it would make your job
easier too?
regards
--
Dave Pawson
XSLT XSL-FO FAQ.
Docbook FAQ.
Received on Monday, 3 January 2022 16:41:00 UTC