Re: non-XML characters (e.g. #1)

Another scope issue Michael?
You're point about "For processors which build their data structures direct from
the ixml form," raises a 'dent' in a simple scoping statement (input
must be utf-8 within XML
character constraints).
  Define it as 'user error' (you asked for it, you got it)? I don't like that.

Report it as a 'warning' with a reason? I think this would be my preference?

regards


On Sat, 1 Jan 2022 at 22:29, C. M. Sperberg-McQueen
<cmsmcq@blackmesatech.com> wrote:
>
> Working though Steven’s tests (and making more corrections
> in the expected results in tests-SP-MSM), I run across an
> interesting policy issue:  what should a processor do with a
> reference in a grammar to character #1?
>
> It’s not an XML 1.0 character (the only C0 control characters
> allowed in XML 1.0 are U+0009, U+000A, and U+000D),
> so it cannot be represented in the XML form of the grammar.
>
> For processors which build their data structures direct from
> the ixml form, and which have no trouble with character U+0001,
> a reference to #1 need cause no trouble, unless the user asks
> the parser to turn that grammar itself into XML.  (And even
> then, it may only matter in some contexts.)  At which point
> we are back to an issue raised already: what happens when the
> combination of input plus grammar produces non-well-formed
> output?
>
> And of course at least some processors which can handle #1
> will not be able to handle #0.
>
> What happens in my processor is that when I create the XML
> form of the grammar in test hex3, all is well and I get the XML
>
> <ixml>
>   <rule name="hex">:<alt>
>       <literal dstring="a"/>,<inclusion>[<range from="#1" to="#7e">-</range>]</inclusion>,<literal dstring="b"/>
>     </alt>.</rule>
> </ixml>
>
> (As you can see, I have not yet updated my internal copy of
> the ixml grammar, so colons and semicolons and such are
> appearing as literals.)
>
> When I compile the grammar,  the code naively attempts to
> turn #1 into a character, and compilation fails.
>
> If it’s a run-time error in the grammar, and the implicit claim is
> that an error-free ixml grammar will never produce ill-formed
> output on any input, then we have a run-time error in the
> grammar for ixml grammars, since it does not forbid hex
> references to non-XML (or indeed non-Unicode) characters.
>
> What do people think?
>
> What do we do about this?
>
> Is [#1 - #7e] a legal range?
>
> Michael
>
>


-- 
Dave Pawson
XSLT XSL-FO FAQ.
Docbook FAQ.

Received on Sunday, 2 January 2022 07:57:18 UTC