Re: Terminology proposals from Bethan Tovey-Walsh on 2022-01-27 (public-ixml@w3.org from January 2022)

From: Bethan Tovey-Walsh <accounts@bethan.wales>
Date: Thu, 27 Jan 2022 15:38:35 +0000
To: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>
Cc: ixml <public-ixml@w3.org>
Message-Id: <647B3CC6-4706-44B7-AE02-E29B002B80F6@bethan.wales>
And this time, I’ll attach the document! Apologies, all.

BTW
___________________________________________________ 
Dr. Bethan Tovey-Walsh 
Myfyrwraig PhD | PhD Student CorCenCC 
Prifysgol Abertawe | Swansea University 
Croeso i chi ysgrifennu ataf yn y Gymraeg.

> On 27 Jan 2022, at 15:37, Bethan Tovey-Walsh <accounts@bethan.wales> wrote:
> 
> Thank you, Michael; this is extremely useful. I’ve removed the additional terminology section, and rewritten a number of things.
> 
> In particular, I’ve reworked the definition of "ixml parser”. If you (and others) have a chance to review it, I’d be very grateful.
> 
> Could I also ask for opinions on using “ixml grammar” as opposed to my earlier suggestion of “ixml input grammar”? I’ve noticed a few uses of “input grammar” in our recent discussions, and I wonder whether it would be a good idea to reintroduce it.
> 
> Very bests,
> BTW
> 
> 
> ___________________________________________________ 
> Dr. Bethan Tovey-Walsh 
> Myfyrwraig PhD | PhD Student CorCenCC 
> Prifysgol Abertawe | Swansea University 
> Croeso i chi ysgrifennu ataf yn y Gymraeg.
> 
>> On 26 Jan 2022, at 14:48, C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com <mailto:cmsmcq@blackmesatech.com>> wrote:
>> 
>> 
>> Bethan Tovey-Walsh writes:
>> 
>>> Attached is an edit of Dave's terminology document. I've proposed some
>>> general revisions, and added some terms that I think/hope others may
>>> find useful.
>> 
>>> In particular, I've been finding it hard to work out how to
>>> distinguish between the direct XML output of the ixml processor,
>>> before any extra processing to turn it into a desired output
>>> format. It can't really be an "ixml document", because that risks
>>> confusion with the ixml input. And just calling it the "ixml output"
>>> was becoming extremely frustrating when I wanted to find a way to
>>> distinguish it from the result of post-processing it, e.g. to produce
>>> JSON, or a different flavour of XML, or whatever.
>> 
>> In a pipeline with post-processing steps, perhaps the ixml output is not
>> the same as the final output (or as the output of any other step).
>> 
>> Since the ixml spec doesn't define any postprocessing, perhaps our terms
>> for the results of other processing can be relatively loose and
>> informal?  I see that that is roughly where you ended up, though I think
>> your frustration is audible in the sternness with which you admonish the
>> reader not to refer to downstream output as ixml output.
>> 
>> 
>>> I look forward to your comments and criticisms when we meet, and thank
>>> you in advance for corrections to any mistakes.
>> 
>> Some comments follow.
>> 
>> The document says
>> 
>>> ## ixml parser
>>> 
>>> An *ixml parser* is a parser constructed from an *ixml input grammar*.
>> 
>> This seems to reflect assumptions about the internal workings of an ixml
>> processor which I think are not universal and should not be baked into
>> our way of talking about things.
>> 
>> A parser is (I think) generally understood to be an executable program.
>> 
>> There are plenty of ways of building parsers which take a grammar G as
>> input and produce as output a parser, i.e. an executable program that
>> parses input against G.  Yacc and other parser generators work this
>> way.  
>> 
>> There are other ways of parsing input that involve a general-purpose
>> parser parsing input against a grammar, where both the input string and
>> the grammar are input to the parser and no separate parser is
>> constructed from or for the grammar.
>> 
>> Both approaches are imaginable for ixml processors.  If I ever get
>> around to writing a program that reads an ixml grammar G and translates
>> it into Mercury notation, so that I can compile the Mercury program and
>> have a parser that reads input matching G and produces XML, then an ixml
>> processor built around that will first generate an ixml parser (in the
>> sense indicated in your document) and then use it to parse the user's
>> input.
>> 
>> But that is not how an Earley parser works.  An Earley parser works for
>> any grammar; it does not generate a separate parser for each grammar,
>> and it is not, itself, generated from any grammar.  So under the
>> definition proposed, no Earley parser is an ixml parser, even when it is
>> used to parse input against an ixml grammar.  I don't think that's a
>> helpful terminological pattern.
>> 
>> In practice, I expect people's natural inclination will be to treat
>> "ixml parser" and "ixml processor" as extensionally equivalent.
>> 
>> Unless we change the spec to talk about parses that cover only part of
>> the input, I think the "additional terminology" section may be
>> unnecessary.  After the discussion last week I am skeptical of the idea
>> of making such a change.
>> 
>> I suspect that in discussions of parsing, a term like "partial parse" is
>> used to refer to a state of analysis in which part of the input string
>> has been analysed (and, in a left-to-right online parsing algorithm,
>> consumed) and part remains unanalysed, so that it is not yet clear
>> whether the input is or is not a sentence in the language defined by the
>> grammar.  Using the term to refer instead to a complete parse of a
>> prefix of the input seems likely to lead to confusion in the long run.
>> Also, the formulations in the additional terminology section seem to
>> focus quite narrowly on sentence generation starting with a grammar and
>> ending with a sentence, and not on parsing construed as the process of
>> starting with a sentence and ending with a parse tree.  The definitions
>> would feel less procedural to me if they spoke in terms of strings
>> being, or not being, sentences in a language, rather than in terms of
>> the rewriting of sentential forms.
>> 
>> I hope this helps.  
>> 
>> -- 
>> C. M. Sperberg-McQueen
>> Black Mesa Technologies LLC
>> http://blackmesatech.com <http://blackmesatech.com/>
>> 
>
Attachments

text/html attachment: stored
text/markdown attachment: terminology_220127.md
text/html attachment: stored
Received on Thursday, 27 January 2022 15:38:49 UTC