Re: Terminology proposals

Bethan Tovey-Walsh writes:

> Attached is an edit of Dave's terminology document. I've proposed some
> general revisions, and added some terms that I think/hope others may
> find useful.

> In particular, I've been finding it hard to work out how to
> distinguish between the direct XML output of the ixml processor,
> before any extra processing to turn it into a desired output
> format. It can't really be an "ixml document", because that risks
> confusion with the ixml input. And just calling it the "ixml output"
> was becoming extremely frustrating when I wanted to find a way to
> distinguish it from the result of post-processing it, e.g. to produce
> JSON, or a different flavour of XML, or whatever.

In a pipeline with post-processing steps, perhaps the ixml output is not
the same as the final output (or as the output of any other step).

Since the ixml spec doesn't define any postprocessing, perhaps our terms
for the results of other processing can be relatively loose and
informal?  I see that that is roughly where you ended up, though I think
your frustration is audible in the sternness with which you admonish the
reader not to refer to downstream output as ixml output.


> I look forward to your comments and criticisms when we meet, and thank
> you in advance for corrections to any mistakes.

Some comments follow.

The document says

> ## ixml parser
>
> An *ixml parser* is a parser constructed from an *ixml input grammar*.

This seems to reflect assumptions about the internal workings of an ixml
processor which I think are not universal and should not be baked into
our way of talking about things.

A parser is (I think) generally understood to be an executable program.

There are plenty of ways of building parsers which take a grammar G as
input and produce as output a parser, i.e. an executable program that
parses input against G.  Yacc and other parser generators work this
way.  

There are other ways of parsing input that involve a general-purpose
parser parsing input against a grammar, where both the input string and
the grammar are input to the parser and no separate parser is
constructed from or for the grammar.

Both approaches are imaginable for ixml processors.  If I ever get
around to writing a program that reads an ixml grammar G and translates
it into Mercury notation, so that I can compile the Mercury program and
have a parser that reads input matching G and produces XML, then an ixml
processor built around that will first generate an ixml parser (in the
sense indicated in your document) and then use it to parse the user's
input.

But that is not how an Earley parser works.  An Earley parser works for
any grammar; it does not generate a separate parser for each grammar,
and it is not, itself, generated from any grammar.  So under the
definition proposed, no Earley parser is an ixml parser, even when it is
used to parse input against an ixml grammar.  I don't think that's a
helpful terminological pattern.

In practice, I expect people's natural inclination will be to treat
"ixml parser" and "ixml processor" as extensionally equivalent.

Unless we change the spec to talk about parses that cover only part of
the input, I think the "additional terminology" section may be
unnecessary.  After the discussion last week I am skeptical of the idea
of making such a change.

I suspect that in discussions of parsing, a term like "partial parse" is
used to refer to a state of analysis in which part of the input string
has been analysed (and, in a left-to-right online parsing algorithm,
consumed) and part remains unanalysed, so that it is not yet clear
whether the input is or is not a sentence in the language defined by the
grammar.  Using the term to refer instead to a complete parse of a
prefix of the input seems likely to lead to confusion in the long run.
Also, the formulations in the additional terminology section seem to
focus quite narrowly on sentence generation starting with a grammar and
ending with a sentence, and not on parsing construed as the process of
starting with a sentence and ending with a parse tree.  The definitions
would feel less procedural to me if they spoke in terms of strings
being, or not being, sentences in a language, rather than in terms of
the rewriting of sentential forms.

I hope this helps.  

-- 
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
http://blackmesatech.com

Received on Wednesday, 26 January 2022 14:48:58 UTC