On pragmas…

Hello,

Since I said we should try to have some email discussion of pragmas
before the next meeting, let me start.

Where are we on pragmas? Here’s my perspective.

Some of this recapitulates parts of the pragmas proposal from Michael
and Tom. I hope it’s helpful, and not distracting, for me to capture
this perspective in an email message.

1. What are pragmas?

They’re mechanism for a grammar author to signal that something is
special about some part of the grammar. What constitutes “special” is
open ended. They’re a mechanism for the author to communicate to a
processor that something is special. If the processor doesn’t know
what that kind of special is, it can just ignore the pragma.

A pragma might apply to the grammar as a whole, to a particular rule,
or to a particular symbol. (It’s possible to go further than that, and
we might want to go a little further, but that’s sufficient for the
moment.)

Lots of languages make things we might call pragmas from a form that
looks a bit like a comment. In line oriented languages, that’s kind of
ok.

//#ifdef some-condition
…
//#endif

It’s not ideal in a lot of ways, but it works well enough. It doesn’t
work especially well for things that need to be more closely bound to
the pragma. Java uses annotations:

@Test
public void thisIsATest() { … }

So there’s clearly precedent for non-comment syntaxes.

For languages that aren’t line oriented, like XML, this doesn’t really
work at all. Comments are effectively unusable really because they
can’t be nested.

XML has a CDATA pragma built in:

<![CDATA[ … ]]>

but it carries with it the awkward constraint that “]]>” is forbidden
in content. In principle other pragmas of the form <![TOKEN[ could be
invented (there were several others in SGML), but XML provides no
mechanism for them to be defined.

In XML, I expect most pragma like things boil down to being processing
instructions:

<?test?>
<function name="thisIsATest">…</function>

Unfortunately, PIs don’t have a start and an end so you need to manage
them as tombstones which is more than a little inconvenient.

2. Are pragmas comments?

Sure, in as much as, like comments, you can ignore them if you don’t
care about pragmas, or if you encounter a pragma you don’t recognize,
or if the moon is full.

3. Are pragmas comments?

No, not really. Even if we settle on a syntax that makes them look
like a kind of comment, it’s not sufficient to just leave them as
comments in the grammar. It’s easy to demonstrate that with an
example. Consider:

{[example grammar pragma]}

{[example rule pragma]}
symbol: A .

A: {[example symbol 'a' pragma]} 'a', {[example symbol B pragma]} B.
B: .

If you parse this with an ixml grammar that knows nothing about
pragmas, those are comments, and the result is:

<ixml>
   <comment>[example grammar pragma]</comment>
   <comment>[example rule pragma]</comment>
   <rule name="symbol">
      <alt>
         <nonterminal name="A"/>
      </alt>
   </rule>
   <rule name="A">
      <comment>[example symbol 'a' pragma]</comment>
      <alt>
         <literal string="a"/>
         <comment>[example symbol B pragma]</comment>
         <nonterminal name="B"/>
      </alt>
   </rule>
   <rule name="B">
      <alt/>
   </rule>
</ixml>

This is unsatisfactory in a couple of ways. First, it’s impossible to
distinguish between the pragmas that are intended to apply to the
grammar as a whole and the pragmas that are supposed to apply to the
first rule. Second, the pragma is not associated with its target. It
looks like the association rule would have to be “next symbol”, but
that doesn’t work for the ‘a’ pragma because its next symbol is the
<alt>.

If, instead, we extend the ixml grammar to have pragmas as distinct
from comments, we can do much better:

<ixml>
   <prolog>
      <pragma name="example">
         <pragma-data>grammar pragma</pragma-data>
      </pragma>
   </prolog>
   <rule name="symbol">
      <pragma name="example">
         <pragma-data>rule pragma</pragma-data>
      </pragma>
      <alt>
         <nonterminal name="A"/>
      </alt>
   </rule>
   <rule name="A">
      <alt>
         <literal string="a">
            <pragma name="example">
               <pragma-data>symbol 'a' pragma</pragma-data>
            </pragma>
         </literal>
         <nonterminal name="B">
            <pragma name="example">
               <pragma-data>symbol B pragma</pragma-data>
            </pragma>
         </nonterminal>
      </alt>
   </rule>
   <rule name="B">
      <alt/>
   </rule>
</ixml>

Now pragmas are properly contained by the elements to which they
apply. In fairness, the input for that example is very slightly
different from the ixml shown above. It begins:

  {[example grammar pragma]} .

We need the slight extension of a full stop after pragmas that apply
to the whole grammar in order to distinguish them from pragmas for the
first rule.

4. How much change is required to the ixml grammar?

Not a lot, actually. The output above comes from a grammar that
differs from the current ixml grammar only in the following rules:

       prolog: s, (ppragma+s, s)?. 
     -ppragma: pragma, s, -'.'.
      comment: -"{", ((comment; ~["[]{}"]), (cchar; comment)*)?, -"}".
         rule: annotation, name, s, -["=:"], s, -alts, (pragma, sp)?, -".". 
  nonterminal: annotation, name, s.
  -annotation: (pragma, sp)?, (mark, sp)?.
          -sp: (whitespace; comment; pragma)*.
      -quoted: tannotation, string, s.
 -tannotation: (pragma, sp)?, (tmark, sp)?.
     -encoded: tannotation, -"#", hex, s.
    inclusion: tannotation,          set.
    exclusion: tannotation, -"~", s, set.
       pragma: -"{[", @pmark?, @name, (whitespace, pragma-data)?, -"]}" . 
       @pmark: ["@^?"].
  pragma-data: (-pragma-char; -bracket-pair)*.
 -pragma-char: ~["{}"].
-bracket-pair: '{', -pragma-data, '}'.

Note: if we decided to support QNames for pragma names, there would be
a couple of additional rules.

5. Do we really need pragmas?

Yes, I think we do. I think the use cases provided in the pragmas
proposal make a compelling case. I’ve already implemented a few,
including a pragma that allows the parser to use a greedy regex match
for symbols. I think that’s got potential to really improve
performance.

6. What about things that should be part of the language?

It’s possible that implementors will use pragmas to support features
not otherwise in the language. I don’t think that’s a compelling
argument against pragmas. When the language supports those features
natively, the pragmas will fade away. In the meantime, it’s a way to
experiment with designs that might usefully inform future versions of
the language.

Implementors will implement. It seems like it’s better to provide an
extensibility mechanism that’s broadly interoperable than to have all
extensibility be done by incompatible grammar extension.

For what it’s worth, I’m currently using the pragmas grammar as
described above by default in my implementation. For a grammar that
doesn’t use pragmas, it makes no difference. For a grammar that does
use pragmas, the implementation either does what they’re programmed to
mean or it ignores them.

7. Open questions

I think there are two open questions, beyond “are we going to support
them” which I hope will not be open much longer!

a. Are pragmas going to be identified by name or by QName? Personally,
I think we should just use @name until such time as @qname is allowed
in other places in the grammar, like the names of nonterminals. If
ever, since I’ve come to the conclusion we shouldn’t do that :-)

b. Precisely how should the grammar include pragmas? The grammar from
the pragmas proposal is good enough I think. That’s basically what you
see above, except that I think I had to tweak it in a couple of small
ways to remove ambiguity.

It might be worth exploring a few small changes, so that a pragma can
apply to a following alts group instead of only a following symbol,
but I don’t think it’s a show stopper if we decide not to.

                                        Be seeing you,
                                          norm

--
Norm Tovey-Walsh
Saxonica

Received on Thursday, 24 March 2022 14:26:50 UTC