desiderata for the schema(s) for ixml grammars in XML

In a recent meeting I think I mentioned my goal of writing a transform
that will read an ixml grammar and produce a schema describing the XML
documents that can be produced by parsing input against that grammar.
One use of such a schema, for me, is to allow syntax-directed editing
of data described by an ixml grammar — notably including ixml itself.

If we want to encourage or require conforming processors to accept
grammars in XML, having an authoritative schema describing the set of
grammars they should or must accept might be helpful.

In practice, however, I find that when I work with ixml grammars in
XML, I frequently want to annotate them; often I build a pipeline of
XSLT transforms which begin by doing some straightforward annotations
(e.g. making lists of all possible ancestors or descendants of a
nonterminal, or recording whether a nonterminal generates the empty
string) and then create a related grammar based in part on those
annotations.  A schema that matches *only* documents that could be
produced by parsing an ixml grammar is no good to me, because it
doesn't handle my annotations.

It's easy enough to extend a standard schema for ixml with rules
saying that namespaced non-ixml attributes are valid on any element,
and that namespaced non-ixml elements can appear anywhere.  So it's
not a requirement that a standard schema for ixml allow extension
attributes and extension elements.  But I suspect that the desire for
a schema allowing extension attributes and extension elements may not
be limited to me.

For purposes of discussion, then, I make the following proposal
regarding a schema for ixml grammars in XML form:

1. We should have a standard schema pointed to from the spec.

2.  In fact, we should have two:

    * one that describes as closely as possible the set of XML
      documents that can be generated by parsing an ixml grammar
      against the standard ixml.ixml, and
      
    * one that also allows extension attributes and extension
      elements.

I'll call these the narrow schema and the broad schema.  (Some may
prefer 'strict' and 'lax'.)

3.  Exactly what counts as an extension attribute or extension element
is tbd.

We might require declaration of extension namespaces in the style of
XSLT.  For the moment, my proposal would be: any namespace-qualified
attribute or element whose namespace is not the ixml namespace) can
occur at any position as a child or attribute of a standard ixml
element.  (Possible exception for comments?)  That's easy to achieve
with wildcards.

What can occur inside an extension element is a matter for those who
define it; in particular, ixml does not forbid extension elements from
having children or attributes in the ixml namespace defined by the
ixml spec.

Under the pragmas proposal Tom Hillman and I are working on, some
non-ixml elements, attributes, and processing instructions in an XML
grammar will count as pragmas, but not necessarily all.  (Pragmas will
we hope have the property that they can be written out in ixml form
without loss of information; that is not guaranteed true of other
extension elements.)

4.  The spec should say that:

    * Conforming processors MUST (or SHOULD -- open question, I guess)
      accept grammars in XML that conform to the narrow schema.

    * Conforming processors SHOULD (or MUST?) accept grammars that
      conform to the broad schema.

The standard interpretation of a broad-schema grammar is the same as
the interpretation of the narrow-schema grammar that would result if
we removed all extension elements (with all their contents) and all
extension attributes.

I wonder what other people think.

Michael

Received on Saturday, 4 December 2021 18:57:56 UTC