- From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
- Date: Sat, 1 Jan 2022 15:28:50 -0700
- To: ixml <public-ixml@w3.org>
- Cc: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>
Working though Steven’s tests (and making more corrections in the expected results in tests-SP-MSM), I run across an interesting policy issue: what should a processor do with a reference in a grammar to character #1? It’s not an XML 1.0 character (the only C0 control characters allowed in XML 1.0 are U+0009, U+000A, and U+000D), so it cannot be represented in the XML form of the grammar. For processors which build their data structures direct from the ixml form, and which have no trouble with character U+0001, a reference to #1 need cause no trouble, unless the user asks the parser to turn that grammar itself into XML. (And even then, it may only matter in some contexts.) At which point we are back to an issue raised already: what happens when the combination of input plus grammar produces non-well-formed output? And of course at least some processors which can handle #1 will not be able to handle #0. What happens in my processor is that when I create the XML form of the grammar in test hex3, all is well and I get the XML <ixml> <rule name="hex">:<alt> <literal dstring="a"/>,<inclusion>[<range from="#1" to="#7e">-</range>]</inclusion>,<literal dstring="b"/> </alt>.</rule> </ixml> (As you can see, I have not yet updated my internal copy of the ixml grammar, so colons and semicolons and such are appearing as literals.) When I compile the grammar, the code naively attempts to turn #1 into a character, and compilation fails. If it’s a run-time error in the grammar, and the implicit claim is that an error-free ixml grammar will never produce ill-formed output on any input, then we have a run-time error in the grammar for ixml grammars, since it does not forbid hex references to non-XML (or indeed non-Unicode) characters. What do people think? What do we do about this? Is [#1 - #7e] a legal range? Michael
Received on Saturday, 1 January 2022 22:29:12 UTC