- From: Dave Pawson <dave.pawson@gmail.com>
- Date: Sun, 2 Jan 2022 07:56:54 +0000
- To: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>
- Cc: ixml <public-ixml@w3.org>
Another scope issue Michael? You're point about "For processors which build their data structures direct from the ixml form," raises a 'dent' in a simple scoping statement (input must be utf-8 within XML character constraints). Define it as 'user error' (you asked for it, you got it)? I don't like that. Report it as a 'warning' with a reason? I think this would be my preference? regards On Sat, 1 Jan 2022 at 22:29, C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com> wrote: > > Working though Steven’s tests (and making more corrections > in the expected results in tests-SP-MSM), I run across an > interesting policy issue: what should a processor do with a > reference in a grammar to character #1? > > It’s not an XML 1.0 character (the only C0 control characters > allowed in XML 1.0 are U+0009, U+000A, and U+000D), > so it cannot be represented in the XML form of the grammar. > > For processors which build their data structures direct from > the ixml form, and which have no trouble with character U+0001, > a reference to #1 need cause no trouble, unless the user asks > the parser to turn that grammar itself into XML. (And even > then, it may only matter in some contexts.) At which point > we are back to an issue raised already: what happens when the > combination of input plus grammar produces non-well-formed > output? > > And of course at least some processors which can handle #1 > will not be able to handle #0. > > What happens in my processor is that when I create the XML > form of the grammar in test hex3, all is well and I get the XML > > <ixml> > <rule name="hex">:<alt> > <literal dstring="a"/>,<inclusion>[<range from="#1" to="#7e">-</range>]</inclusion>,<literal dstring="b"/> > </alt>.</rule> > </ixml> > > (As you can see, I have not yet updated my internal copy of > the ixml grammar, so colons and semicolons and such are > appearing as literals.) > > When I compile the grammar, the code naively attempts to > turn #1 into a character, and compilation fails. > > If it’s a run-time error in the grammar, and the implicit claim is > that an error-free ixml grammar will never produce ill-formed > output on any input, then we have a run-time error in the > grammar for ixml grammars, since it does not forbid hex > references to non-XML (or indeed non-Unicode) characters. > > What do people think? > > What do we do about this? > > Is [#1 - #7e] a legal range? > > Michael > > -- Dave Pawson XSLT XSL-FO FAQ. Docbook FAQ.
Received on Sunday, 2 January 2022 07:57:18 UTC