- From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
- Date: Sun, 2 Jan 2022 11:13:38 -0700
- To: Dave Pawson <dave.pawson@gmail.com>
- Cc: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>, ixml <public-ixml@w3.org>
> On 2,Jan2022, at 12:56 AM, Dave Pawson <dave.pawson@gmail.com> wrote: > > Another scope issue Michael? > You're point about "For processors which build their data structures direct from > the ixml form," raises a 'dent' in a simple scoping statement (input > must be utf-8 within XML > character constraints). At the moment, I don’t think we do have a rule requiring that input fall within XML constraints. If we did, I could simply update the test case to require the result ’this is not a conforming grammar’. > Define it as 'user error' (you asked for it, you got it)? I don't like that. > > Report it as a 'warning' with a reason? I think this would be my preference? I’m not sure what it would mean to call this or other things user errors. I think we get to define rules for grammars and processors, not users. Michael > > regards > > > On Sat, 1 Jan 2022 at 22:29, C. M. Sperberg-McQueen > <cmsmcq@blackmesatech.com> wrote: >> >> Working though Steven’s tests (and making more corrections >> in the expected results in tests-SP-MSM), I run across an >> interesting policy issue: what should a processor do with a >> reference in a grammar to character #1? >> >> It’s not an XML 1.0 character (the only C0 control characters >> allowed in XML 1.0 are U+0009, U+000A, and U+000D), >> so it cannot be represented in the XML form of the grammar. >> >> For processors which build their data structures direct from >> the ixml form, and which have no trouble with character U+0001, >> a reference to #1 need cause no trouble, unless the user asks >> the parser to turn that grammar itself into XML. (And even >> then, it may only matter in some contexts.) At which point >> we are back to an issue raised already: what happens when the >> combination of input plus grammar produces non-well-formed >> output? >> >> And of course at least some processors which can handle #1 >> will not be able to handle #0. >> >> What happens in my processor is that when I create the XML >> form of the grammar in test hex3, all is well and I get the XML >> >> <ixml> >> <rule name="hex">:<alt> >> <literal dstring="a"/>,<inclusion>[<range from="#1" to="#7e">-</range>]</inclusion>,<literal dstring="b"/> >> </alt>.</rule> >> </ixml> >> >> (As you can see, I have not yet updated my internal copy of >> the ixml grammar, so colons and semicolons and such are >> appearing as literals.) >> >> When I compile the grammar, the code naively attempts to >> turn #1 into a character, and compilation fails. >> >> If it’s a run-time error in the grammar, and the implicit claim is >> that an error-free ixml grammar will never produce ill-formed >> output on any input, then we have a run-time error in the >> grammar for ixml grammars, since it does not forbid hex >> references to non-XML (or indeed non-Unicode) characters. >> >> What do people think? >> >> What do we do about this? >> >> Is [#1 - #7e] a legal range? >> >> Michael >> >> > > > -- > Dave Pawson > XSLT XSL-FO FAQ. > Docbook FAQ. >
Received on Sunday, 2 January 2022 18:14:03 UTC