Re: non-XML characters (e.g. #1)

On Mon, 3 Jan 2022 at 16:31, Norm Tovey-Walsh <norm@saxonica.com> wrote:

> > Then what are you saying above?
> > I provide C0 char in, "it doesn't end up in the output"
> > IMHO that is modifying my data as given to the application?
>
> But modifying data is what ixml *is for*.

I think we have a different interpretation of modifying?


>
> You write a grammar that translates some non-XML format into XML. Along
> the way, you decide what items in the non-XML format get turned into
> attributes, what items get turned into elements, what items get output
> as characters, and what items get omitted.

Agreed. But changing Fred into Jane (I think) is not part of the bargain?

OK, I'm being more crude than you, but I hope you can see my
objection?



>
> All Steven is saying is that if you write a grammar that accepts input
> that contains C0 control characters, you better make sure all the C0
> control charactesr get omitted if you’re going to make XML at the end of
> the day.

Which is at the heart of my objection.


>
> Consider this grammar for amounts of money in GBP (written on the fly
> and untested, YMMV):
>
> cost: "£"? digit+ ("." digit+)? .
> -digit: ["0"-"9"] .
>
> If you parse “£1234.56” with that grammar, you get
>
> <cost>£1234.56</cost>

No problem, you've not messed with my input data.

>
> Suppose for the sake of argument that “£” was not a valid XML character.
> Then that XML output would be invalid. And that would be because *you*
> wrote a grammar that generated something invalid!

Halt reset and load.
   I *want* this to be wrong | in error | prohibited?
i.e. in the 'should not happen' (or reported as an error etc)


>
> You could instead have written the grammar like this:
>
> cost: -"£"? digit+ ("." digit+)? .
> -digit: ["0"-"9"] .
>
> And then you’d get
>
> <cost>1234.56</cost>
>
> That logic applies for all characters (actually) not valid in XML.
>
> Does that help?

No Norm, because it seems you're accepting the spec, with non-XML
characters as 'good to go'. I'm saying it needs changing.
  If I'm interpreting Michaels comments correctly, it would make your job
easier too?

regards

-- 
Dave Pawson
XSLT XSL-FO FAQ.
Docbook FAQ.

Received on Monday, 3 January 2022 16:41:00 UTC