- From: Dave Pawson <dave.pawson@gmail.com>
- Date: Sun, 2 Jan 2022 18:19:12 +0000
- To: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>
- Cc: ixml <public-ixml@w3.org>
- Message-ID: <CAEncD4cYnLC1cJmgG6dpxu8g5e8dQsicfZLb7q3yQYWmaEGVQg@mail.gmail.com>
On Sun, 2 Jan 2022 at 18:13, C. M. Sperberg-McQueen < cmsmcq@blackmesatech.com> wrote: > > > > On 2,Jan2022, at 12:56 AM, Dave Pawson <dave.pawson@gmail.com> wrote: > > > > Another scope issue Michael? > > You're point about "For processors which build their data structures > direct from > > the ixml form," raises a 'dent' in a simple scoping statement (input > > must be utf-8 within XML > > character constraints). > > At the moment, I don’t think we do have a rule requiring that input > fall within XML constraints. If we did, I could simply update the > test case to require the result ’this is not a conforming grammar’. No, i was suggesting such an addition. > > > > Define it as 'user error' (you asked for it, you got it)? I don't like > that. > > > > Report it as a 'warning' with a reason? I think this would be my > preference? > > I’m not sure what it would mean to call this or other things user > errors. I think we get to define rules for grammars and processors, > not users. > A) the user asked for xml output, gave ‘bad’ input, hence imho a user error. B) I would hate it if the processor modified my input without telling me? Regards > Michael > > > > > > regards > > > > > > On Sat, 1 Jan 2022 at 22:29, C. M. Sperberg-McQueen > > <cmsmcq@blackmesatech.com> wrote: > >> > >> Working though Steven’s tests (and making more corrections > >> in the expected results in tests-SP-MSM), I run across an > >> interesting policy issue: what should a processor do with a > >> reference in a grammar to character #1? > >> > >> It’s not an XML 1.0 character (the only C0 control characters > >> allowed in XML 1.0 are U+0009, U+000A, and U+000D), > >> so it cannot be represented in the XML form of the grammar. > >> > >> For processors which build their data structures direct from > >> the ixml form, and which have no trouble with character U+0001, > >> a reference to #1 need cause no trouble, unless the user asks > >> the parser to turn that grammar itself into XML. (And even > >> then, it may only matter in some contexts.) At which point > >> we are back to an issue raised already: what happens when the > >> combination of input plus grammar produces non-well-formed > >> output? > >> > >> And of course at least some processors which can handle #1 > >> will not be able to handle #0. > >> > >> What happens in my processor is that when I create the XML > >> form of the grammar in test hex3, all is well and I get the XML > >> > >> <ixml> > >> <rule name="hex">:<alt> > >> <literal dstring="a"/>,<inclusion>[<range from="#1" > to="#7e">-</range>]</inclusion>,<literal dstring="b"/> > >> </alt>.</rule> > >> </ixml> > >> > >> (As you can see, I have not yet updated my internal copy of > >> the ixml grammar, so colons and semicolons and such are > >> appearing as literals.) > >> > >> When I compile the grammar, the code naively attempts to > >> turn #1 into a character, and compilation fails. > >> > >> If it’s a run-time error in the grammar, and the implicit claim is > >> that an error-free ixml grammar will never produce ill-formed > >> output on any input, then we have a run-time error in the > >> grammar for ixml grammars, since it does not forbid hex > >> references to non-XML (or indeed non-Unicode) characters. > >> > >> What do people think? > >> > >> What do we do about this? > >> > >> Is [#1 - #7e] a legal range? > >> > >> Michael > >> > >> > > > > > > -- > > Dave Pawson > > XSLT XSL-FO FAQ. > > Docbook FAQ. > > > > -- Dave Pawson XSLT XSL-FO FAQ. Docbook FAQ.
Received on Sunday, 2 January 2022 18:19:36 UTC