- From: Bethan Tovey-Walsh <accounts@bethan.wales>
- Date: Thu, 27 Jan 2022 15:37:39 +0000
- To: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>
- Cc: public-ixml@w3.org
- Message-Id: <9F2945AB-EB91-4F00-B673-7581A4235BD4@bethan.wales>
Thank you, Michael; this is extremely useful. I’ve removed the additional terminology section, and rewritten a number of things. In particular, I’ve reworked the definition of "ixml parser”. If you (and others) have a chance to review it, I’d be very grateful. Could I also ask for opinions on using “ixml grammar” as opposed to my earlier suggestion of “ixml input grammar”? I’ve noticed a few uses of “input grammar” in our recent discussions, and I wonder whether it would be a good idea to reintroduce it. Very bests, BTW ___________________________________________________ Dr. Bethan Tovey-Walsh Myfyrwraig PhD | PhD Student CorCenCC Prifysgol Abertawe | Swansea University Croeso i chi ysgrifennu ataf yn y Gymraeg. > On 26 Jan 2022, at 14:48, C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com> wrote: > > > Bethan Tovey-Walsh writes: > >> Attached is an edit of Dave's terminology document. I've proposed some >> general revisions, and added some terms that I think/hope others may >> find useful. > >> In particular, I've been finding it hard to work out how to >> distinguish between the direct XML output of the ixml processor, >> before any extra processing to turn it into a desired output >> format. It can't really be an "ixml document", because that risks >> confusion with the ixml input. And just calling it the "ixml output" >> was becoming extremely frustrating when I wanted to find a way to >> distinguish it from the result of post-processing it, e.g. to produce >> JSON, or a different flavour of XML, or whatever. > > In a pipeline with post-processing steps, perhaps the ixml output is not > the same as the final output (or as the output of any other step). > > Since the ixml spec doesn't define any postprocessing, perhaps our terms > for the results of other processing can be relatively loose and > informal? I see that that is roughly where you ended up, though I think > your frustration is audible in the sternness with which you admonish the > reader not to refer to downstream output as ixml output. > > >> I look forward to your comments and criticisms when we meet, and thank >> you in advance for corrections to any mistakes. > > Some comments follow. > > The document says > >> ## ixml parser >> >> An *ixml parser* is a parser constructed from an *ixml input grammar*. > > This seems to reflect assumptions about the internal workings of an ixml > processor which I think are not universal and should not be baked into > our way of talking about things. > > A parser is (I think) generally understood to be an executable program. > > There are plenty of ways of building parsers which take a grammar G as > input and produce as output a parser, i.e. an executable program that > parses input against G. Yacc and other parser generators work this > way. > > There are other ways of parsing input that involve a general-purpose > parser parsing input against a grammar, where both the input string and > the grammar are input to the parser and no separate parser is > constructed from or for the grammar. > > Both approaches are imaginable for ixml processors. If I ever get > around to writing a program that reads an ixml grammar G and translates > it into Mercury notation, so that I can compile the Mercury program and > have a parser that reads input matching G and produces XML, then an ixml > processor built around that will first generate an ixml parser (in the > sense indicated in your document) and then use it to parse the user's > input. > > But that is not how an Earley parser works. An Earley parser works for > any grammar; it does not generate a separate parser for each grammar, > and it is not, itself, generated from any grammar. So under the > definition proposed, no Earley parser is an ixml parser, even when it is > used to parse input against an ixml grammar. I don't think that's a > helpful terminological pattern. > > In practice, I expect people's natural inclination will be to treat > "ixml parser" and "ixml processor" as extensionally equivalent. > > Unless we change the spec to talk about parses that cover only part of > the input, I think the "additional terminology" section may be > unnecessary. After the discussion last week I am skeptical of the idea > of making such a change. > > I suspect that in discussions of parsing, a term like "partial parse" is > used to refer to a state of analysis in which part of the input string > has been analysed (and, in a left-to-right online parsing algorithm, > consumed) and part remains unanalysed, so that it is not yet clear > whether the input is or is not a sentence in the language defined by the > grammar. Using the term to refer instead to a complete parse of a > prefix of the input seems likely to lead to confusion in the long run. > Also, the formulations in the additional terminology section seem to > focus quite narrowly on sentence generation starting with a grammar and > ending with a sentence, and not on parsing construed as the process of > starting with a sentence and ending with a parse tree. The definitions > would feel less procedural to me if they spoke in terms of strings > being, or not being, sentences in a language, rather than in terms of > the rewriting of sentential forms. > > I hope this helps. > > -- > C. M. Sperberg-McQueen > Black Mesa Technologies LLC > http://blackmesatech.com >
Received on Thursday, 27 January 2022 15:37:56 UTC