Re: interoperability and extensibility (was: Re: review of conformance section and conformance language) from C. M. Sperberg-McQueen on 2021-06-15 (public-ixml@w3.org from June 2021)

From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
Date: Tue, 15 Jun 2021 10:40:54 -0600
To: Tom Hillman <tom@expertml.com>
Cc: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>, public-ixml@w3.org, Steven Pemberton <steven.pemberton@cwi.nl>
Message-Id: <6BF8BCAC-3340-47DA-B2F1-267DB3DCD98E@blackmesatech.com>
> On 15,Jun2021, at 3:26 AM, Tom Hillman <tom@expertml.com> wrote:

>> I anticipate Tom being unhappy

> Not about the non-conforming grammars bit: your previous email had
> convinced me.

> My biggest concern is the requirement to reject non-conforming
> grammars:

These two sentences together summarize very neatly, I think, why I am
now confused. As indeed Tom predicted.

My state of confusion may mean that my responses to the rest of your
note are unhelpful, in which case I apologize for wasting everyone’s
time.

> I've been trying (with some partial success) to convince myself that a
> requirement for a parser to validate the input grammar isn't as big a
> deal for my XSLT parser as I initially thought; the process has
> brought up some questions.

> How do parsers validate grammars as conforming? Is it enough to assert
> that a given (non-xml) grammar can be parsed using the ixml grammar to
> produce an XML document? How do we validate xml grammars? Do we need
> one or more schema to do so?

In the draft of 9 June at [1], the description of conformance for grammars is:

    An ixml grammar in ixml form conforms to this specification if

      - it is described by the grammar given in this specification, and
      - it satisfies all the other requirements specified for ixml grammars.

    An ixml grammar in XML form conforms to this specification if

      - it can be derived from an ixml grammar in ixml form by parsing as 
        described in this specification, and
      - it satisfies all the other requirements specified for ixml grammars.

[1] https://homepages.cwi.nl/~steven/ixml/ixml-specification.html

So I think the answers to your questions are

  - For ixml grammars, by checking that they are generated by the ixml
    grammar in the spec; for XML grammars, by checking that there is an
    ixml grammar that generates them; for all grammars, by checking the
    extra-grammatical rules (helpfully listed in the conformance
    section, though with no guarantee of completeness).
    
  - No, it is not enough; it is required that you check that no nonterminal
    has multiple definitions, that there are no references to undefined
    nonterminals, that serialized names are legal XML, etc.

  - My plan is to write a schema that guarantees that the XML is
    generated by a grammatical ixml input; I have not written it yet. Or
    rather: I plan to replace my current hand-written schema with one
    generated programmatically from the ixml grammar for ixml grammars.

  - I think that schemas in various notations (DTD, RNG, XSD,
    Schematron) would likely be helpful.

    I'm currently not fussed over whether they are normative or not, and
    I am currently not fussed over whether they are part of the main
    spec or published in other documents. My current leaning is to make
    them non-normative and publish them separately.

I don't currently have a clear view on whether there are conformance
requirements that cannot be handled in one or more of these languages;
at first glance, they all look doable without difficulty, though I have
not looked at the grammar rules to see whether they all translate easily
into deterministic content models. The most complicated bit, as far as I
can see at the moment, is the rule about grammars with hidden roots; the
sneakiest difficulty is that the simplest way to make a useful DTD is to
make rule/@name an ID attribute and nonterminal/@name an IDREF
attribute, which restricts all names to legal XML names, not just names
that get serialized.

And, of course, the constraints you mention on text nodes in the current
grammar, which are as far as I can see easily checked with Schematron
rules.

> My main (perhaps selfish) concern is that it is that writing a
> validator in XSLT as well as a parser will vastly inflate the
> complexity of the task. If we can say that successfully parsing a
> given grammar using the IXML grammar to an XML instance is enough,
> then that is good news for me: it may mean that my parser can't accept
> XML grammars, but at least it means I don't have to implement a
> validator or serialiser to be a conforming processor.

I think it does increase the size of the task, but only modestly, not
vastly.

> However, if we are allowing parsers to accept XML format grammars, and
> requiring those parsers to validate those grammars, should we also be
> publishing a schema for them to do so?

Yes, probably. But I would like to spend more time working on Aparecium
and less time working on ancillary problems, so I am not going to work
on this now.

Adding routines to translate an ixml grammar into a schema is on my
mental to-do list for my Gingersnap library, although I see that it is
not in fact in the one that is written down at [2]. Generating an up to
date schema for ixml is, however, on both the mental and the written
lists.

[2] https://github.com/cmsmcq/gingersnap/blob/main/A/todo.md

If it would help anyone, I can take the RNG schema for ixml that I
generated by hand last July and put it into the lib directory of my ixml
test suite project [3]; let me know.

[3] https://github.com/cmsmcq/ixml-tests

> I took a look yesterday at what that might look like with my go-to
> grammatical schema (RelaxNG), and almost immediately encountered
> problems around controlling the value of text nodes (in Relax, an
> element's content can be complex (and allow mixed text nodes) or
> simple (and allow value patterns), but not both). Possibly we could
> work around that with Schematron, or perhaps this is a good reason to
> tweak the marks on the IXML grammar to bypass this restriction by
> removing superfluous mixed text nodes (e.g. ":" or "=" symbols from
> rule definitions).

I think that for the case of ixml, at least, this is easy to do with ad
hoc checks; for a more general solution, I think Schematron is the way
to go.



********************************************
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
cmsmcq@blackmesatech.com
http://www.blackmesatech.com
********************************************
Received on Tuesday, 15 June 2021 16:42:09 UTC