Re: What about this grammar?

Norm Tovey-Walsh <norm@saxonica.com> writes:

> [[PGP Signed Part:Undecided]]

>> I'm not seeing much upside to allowing literal control characters not
>> permitted in XML in the grammar via some additional notational
>> mechanism.

Like Graydon, I see no upside to this.  The discrepancies that already
exist between ixml and XML (e.g. in the definition of identifiers) don't
make ixml a better or more attractive language; they only set a trap for
users.  Consistency among specs would be more valuable here than letting
a hundred flowers bloom and a letting a thousand definitions of
identifier contend for dominance.  (We already know what happens with
that, from existing programming languages:  programmers who work in
multiple languages tend to restrict themselves to identifiers which are
legal in all of the languages they work in -- or so I have heard it
claimed, allegedly on the basis of empirical studies.  If that is so,
then the carefully designed nuances of the different definitions of
identifier bring in the end no benefit to users of the language.)

But that's a design error we have already committed, apparently on the
theory that there will be people interested in ixml who are not actually
interested in using XML, and that we really ought to cater to their
preferences.  If there are such people, of course, I am not a good guide
to what they will find useful or eccentric.

Because the design error has already been committed, and in the case of
identifiers was explicitly discussed, I don't think we can plausibly fix
this with an erratum.

> Anyway. I can create an iXML file that has a literal U+0013 in it.
>
> If that’s forbidden, that’s fine. If it’s allowed but not required, I
> think that introduces an interoperability issue.

It does.  Any ixml processor built using XSLT or XQuery or XPath will be
treating its input as unparsed text, and won't handle inputs with
non-XML characters no matter what the spec says.  Any ixml processor
which constructs a representation in which nonterminal names from the
grammar are used to name elements or attributes will not handle grammars
with nonterminals which are not legal XML names, no matter what the spec
says.  Aparecium is in both of those classes.

In real life, every person I know who has dealt seriously with
character-set and character-encoding issues would write the ixml grammar
in question with #13, not with a literal control-S , even if they did
not plan to transmit it over the network.  So far the only grammars I
have seen that exercise this interoperability problem are in the test
suite, and at least half of those grammars were written by me, so I
don't think many real users will be affected.


I suppose, in the end, my position is:

  - The ways in which ixml deviates from XML as regards allowable
    characters and allowable names are design errors.

  - I would be happy to vote for a proposal to repair those design
    errors in the obvious ways.

  - What I think is the obvious solution is to say explicitly in the
    spec that in input grammars and input strings conforming processors
    are required to accept any characters that would be legal in XML
    1.0, and in input grammars they are required to accept any
    nonterminals which are XML names, and to add that conforming
    processors MAY accept other character in input and MAY accept
    nonterminals which are not XM names.

  - But I don't think we are likely to reach consensus on such a
    proposal, and I think any time spent discussing it is likely to be
    wasted and possibly counter-productive.

Michael

-- 
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
http://blackmesatech.com

Received on Monday, 12 September 2022 16:48:00 UTC