Re: Terminology proposals from Bethan Tovey-Walsh on 2022-01-27 (public-ixml@w3.org from January 2022)

From: Bethan Tovey-Walsh <accounts@bethan.wales>
Date: Thu, 27 Jan 2022 15:37:39 +0000
To: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>
Cc: public-ixml@w3.org
Message-Id: <9F2945AB-EB91-4F00-B673-7581A4235BD4@bethan.wales>
Thank you, Michael; this is extremely useful. I’ve removed the additional terminology section, and rewritten a number of things.

In particular, I’ve reworked the definition of "ixml parser”. If you (and others) have a chance to review it, I’d be very grateful.

Could I also ask for opinions on using “ixml grammar” as opposed to my earlier suggestion of “ixml input grammar”? I’ve noticed a few uses of “input grammar” in our recent discussions, and I wonder whether it would be a good idea to reintroduce it.

Very bests,
BTW


___________________________________________________ 
Dr. Bethan Tovey-Walsh 
Myfyrwraig PhD | PhD Student CorCenCC 
Prifysgol Abertawe | Swansea University 
Croeso i chi ysgrifennu ataf yn y Gymraeg.

> On 26 Jan 2022, at 14:48, C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com> wrote:
> 
> 
> Bethan Tovey-Walsh writes:
> 
>> Attached is an edit of Dave's terminology document. I've proposed some
>> general revisions, and added some terms that I think/hope others may
>> find useful.
> 
>> In particular, I've been finding it hard to work out how to
>> distinguish between the direct XML output of the ixml processor,
>> before any extra processing to turn it into a desired output
>> format. It can't really be an "ixml document", because that risks
>> confusion with the ixml input. And just calling it the "ixml output"
>> was becoming extremely frustrating when I wanted to find a way to
>> distinguish it from the result of post-processing it, e.g. to produce
>> JSON, or a different flavour of XML, or whatever.
> 
> In a pipeline with post-processing steps, perhaps the ixml output is not
> the same as the final output (or as the output of any other step).
> 
> Since the ixml spec doesn't define any postprocessing, perhaps our terms
> for the results of other processing can be relatively loose and
> informal?  I see that that is roughly where you ended up, though I think
> your frustration is audible in the sternness with which you admonish the
> reader not to refer to downstream output as ixml output.
> 
> 
>> I look forward to your comments and criticisms when we meet, and thank
>> you in advance for corrections to any mistakes.
> 
> Some comments follow.
> 
> The document says
> 
>> ## ixml parser
>> 
>> An *ixml parser* is a parser constructed from an *ixml input grammar*.
> 
> This seems to reflect assumptions about the internal workings of an ixml
> processor which I think are not universal and should not be baked into
> our way of talking about things.
> 
> A parser is (I think) generally understood to be an executable program.
> 
> There are plenty of ways of building parsers which take a grammar G as
> input and produce as output a parser, i.e. an executable program that
> parses input against G.  Yacc and other parser generators work this
> way.  
> 
> There are other ways of parsing input that involve a general-purpose
> parser parsing input against a grammar, where both the input string and
> the grammar are input to the parser and no separate parser is
> constructed from or for the grammar.
> 
> Both approaches are imaginable for ixml processors.  If I ever get
> around to writing a program that reads an ixml grammar G and translates
> it into Mercury notation, so that I can compile the Mercury program and
> have a parser that reads input matching G and produces XML, then an ixml
> processor built around that will first generate an ixml parser (in the
> sense indicated in your document) and then use it to parse the user's
> input.
> 
> But that is not how an Earley parser works.  An Earley parser works for
> any grammar; it does not generate a separate parser for each grammar,
> and it is not, itself, generated from any grammar.  So under the
> definition proposed, no Earley parser is an ixml parser, even when it is
> used to parse input against an ixml grammar.  I don't think that's a
> helpful terminological pattern.
> 
> In practice, I expect people's natural inclination will be to treat
> "ixml parser" and "ixml processor" as extensionally equivalent.
> 
> Unless we change the spec to talk about parses that cover only part of
> the input, I think the "additional terminology" section may be
> unnecessary.  After the discussion last week I am skeptical of the idea
> of making such a change.
> 
> I suspect that in discussions of parsing, a term like "partial parse" is
> used to refer to a state of analysis in which part of the input string
> has been analysed (and, in a left-to-right online parsing algorithm,
> consumed) and part remains unanalysed, so that it is not yet clear
> whether the input is or is not a sentence in the language defined by the
> grammar.  Using the term to refer instead to a complete parse of a
> prefix of the input seems likely to lead to confusion in the long run.
> Also, the formulations in the additional terminology section seem to
> focus quite narrowly on sentence generation starting with a grammar and
> ending with a sentence, and not on parsing construed as the process of
> starting with a sentence and ending with a parse tree.  The definitions
> would feel less procedural to me if they spoke in terms of strings
> being, or not being, sentences in a language, rather than in terms of
> the rewriting of sentential forms.
> 
> I hope this helps.  
> 
> -- 
> C. M. Sperberg-McQueen
> Black Mesa Technologies LLC
> http://blackmesatech.com
>
Received on Thursday, 27 January 2022 15:37:56 UTC