- From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
- Date: Tue, 15 Jun 2021 10:40:54 -0600
- To: Tom Hillman <tom@expertml.com>
- Cc: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>, public-ixml@w3.org, Steven Pemberton <steven.pemberton@cwi.nl>
> On 15,Jun2021, at 3:26 AM, Tom Hillman <tom@expertml.com> wrote: >> I anticipate Tom being unhappy > Not about the non-conforming grammars bit: your previous email had > convinced me. > My biggest concern is the requirement to reject non-conforming > grammars: These two sentences together summarize very neatly, I think, why I am now confused. As indeed Tom predicted. My state of confusion may mean that my responses to the rest of your note are unhelpful, in which case I apologize for wasting everyone’s time. > I've been trying (with some partial success) to convince myself that a > requirement for a parser to validate the input grammar isn't as big a > deal for my XSLT parser as I initially thought; the process has > brought up some questions. > How do parsers validate grammars as conforming? Is it enough to assert > that a given (non-xml) grammar can be parsed using the ixml grammar to > produce an XML document? How do we validate xml grammars? Do we need > one or more schema to do so? In the draft of 9 June at [1], the description of conformance for grammars is: An ixml grammar in ixml form conforms to this specification if - it is described by the grammar given in this specification, and - it satisfies all the other requirements specified for ixml grammars. An ixml grammar in XML form conforms to this specification if - it can be derived from an ixml grammar in ixml form by parsing as described in this specification, and - it satisfies all the other requirements specified for ixml grammars. [1] https://homepages.cwi.nl/~steven/ixml/ixml-specification.html So I think the answers to your questions are - For ixml grammars, by checking that they are generated by the ixml grammar in the spec; for XML grammars, by checking that there is an ixml grammar that generates them; for all grammars, by checking the extra-grammatical rules (helpfully listed in the conformance section, though with no guarantee of completeness). - No, it is not enough; it is required that you check that no nonterminal has multiple definitions, that there are no references to undefined nonterminals, that serialized names are legal XML, etc. - My plan is to write a schema that guarantees that the XML is generated by a grammatical ixml input; I have not written it yet. Or rather: I plan to replace my current hand-written schema with one generated programmatically from the ixml grammar for ixml grammars. - I think that schemas in various notations (DTD, RNG, XSD, Schematron) would likely be helpful. I'm currently not fussed over whether they are normative or not, and I am currently not fussed over whether they are part of the main spec or published in other documents. My current leaning is to make them non-normative and publish them separately. I don't currently have a clear view on whether there are conformance requirements that cannot be handled in one or more of these languages; at first glance, they all look doable without difficulty, though I have not looked at the grammar rules to see whether they all translate easily into deterministic content models. The most complicated bit, as far as I can see at the moment, is the rule about grammars with hidden roots; the sneakiest difficulty is that the simplest way to make a useful DTD is to make rule/@name an ID attribute and nonterminal/@name an IDREF attribute, which restricts all names to legal XML names, not just names that get serialized. And, of course, the constraints you mention on text nodes in the current grammar, which are as far as I can see easily checked with Schematron rules. > My main (perhaps selfish) concern is that it is that writing a > validator in XSLT as well as a parser will vastly inflate the > complexity of the task. If we can say that successfully parsing a > given grammar using the IXML grammar to an XML instance is enough, > then that is good news for me: it may mean that my parser can't accept > XML grammars, but at least it means I don't have to implement a > validator or serialiser to be a conforming processor. I think it does increase the size of the task, but only modestly, not vastly. > However, if we are allowing parsers to accept XML format grammars, and > requiring those parsers to validate those grammars, should we also be > publishing a schema for them to do so? Yes, probably. But I would like to spend more time working on Aparecium and less time working on ancillary problems, so I am not going to work on this now. Adding routines to translate an ixml grammar into a schema is on my mental to-do list for my Gingersnap library, although I see that it is not in fact in the one that is written down at [2]. Generating an up to date schema for ixml is, however, on both the mental and the written lists. [2] https://github.com/cmsmcq/gingersnap/blob/main/A/todo.md If it would help anyone, I can take the RNG schema for ixml that I generated by hand last July and put it into the lib directory of my ixml test suite project [3]; let me know. [3] https://github.com/cmsmcq/ixml-tests > I took a look yesterday at what that might look like with my go-to > grammatical schema (RelaxNG), and almost immediately encountered > problems around controlling the value of text nodes (in Relax, an > element's content can be complex (and allow mixed text nodes) or > simple (and allow value patterns), but not both). Possibly we could > work around that with Schematron, or perhaps this is a good reason to > tweak the marks on the IXML grammar to bypass this restriction by > removing superfluous mixed text nodes (e.g. ":" or "=" symbols from > rule definitions). I think that for the case of ixml, at least, this is easy to do with ad hoc checks; for a more general solution, I think Schematron is the way to go. ******************************************** C. M. Sperberg-McQueen Black Mesa Technologies LLC cmsmcq@blackmesatech.com http://www.blackmesatech.com ********************************************
Received on Tuesday, 15 June 2021 16:42:09 UTC