- From: Bethan Tovey-Walsh <accounts@bethan.wales>
- Date: Thu, 27 Jan 2022 15:38:35 +0000
- To: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>
- Cc: ixml <public-ixml@w3.org>
- Message-Id: <647B3CC6-4706-44B7-AE02-E29B002B80F6@bethan.wales>
And this time, I’ll attach the document! Apologies, all. BTW ___________________________________________________ Dr. Bethan Tovey-Walsh Myfyrwraig PhD | PhD Student CorCenCC Prifysgol Abertawe | Swansea University Croeso i chi ysgrifennu ataf yn y Gymraeg. > On 27 Jan 2022, at 15:37, Bethan Tovey-Walsh <accounts@bethan.wales> wrote: > > Thank you, Michael; this is extremely useful. I’ve removed the additional terminology section, and rewritten a number of things. > > In particular, I’ve reworked the definition of "ixml parser”. If you (and others) have a chance to review it, I’d be very grateful. > > Could I also ask for opinions on using “ixml grammar” as opposed to my earlier suggestion of “ixml input grammar”? I’ve noticed a few uses of “input grammar” in our recent discussions, and I wonder whether it would be a good idea to reintroduce it. > > Very bests, > BTW > > > ___________________________________________________ > Dr. Bethan Tovey-Walsh > Myfyrwraig PhD | PhD Student CorCenCC > Prifysgol Abertawe | Swansea University > Croeso i chi ysgrifennu ataf yn y Gymraeg. > >> On 26 Jan 2022, at 14:48, C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com <mailto:cmsmcq@blackmesatech.com>> wrote: >> >> >> Bethan Tovey-Walsh writes: >> >>> Attached is an edit of Dave's terminology document. I've proposed some >>> general revisions, and added some terms that I think/hope others may >>> find useful. >> >>> In particular, I've been finding it hard to work out how to >>> distinguish between the direct XML output of the ixml processor, >>> before any extra processing to turn it into a desired output >>> format. It can't really be an "ixml document", because that risks >>> confusion with the ixml input. And just calling it the "ixml output" >>> was becoming extremely frustrating when I wanted to find a way to >>> distinguish it from the result of post-processing it, e.g. to produce >>> JSON, or a different flavour of XML, or whatever. >> >> In a pipeline with post-processing steps, perhaps the ixml output is not >> the same as the final output (or as the output of any other step). >> >> Since the ixml spec doesn't define any postprocessing, perhaps our terms >> for the results of other processing can be relatively loose and >> informal? I see that that is roughly where you ended up, though I think >> your frustration is audible in the sternness with which you admonish the >> reader not to refer to downstream output as ixml output. >> >> >>> I look forward to your comments and criticisms when we meet, and thank >>> you in advance for corrections to any mistakes. >> >> Some comments follow. >> >> The document says >> >>> ## ixml parser >>> >>> An *ixml parser* is a parser constructed from an *ixml input grammar*. >> >> This seems to reflect assumptions about the internal workings of an ixml >> processor which I think are not universal and should not be baked into >> our way of talking about things. >> >> A parser is (I think) generally understood to be an executable program. >> >> There are plenty of ways of building parsers which take a grammar G as >> input and produce as output a parser, i.e. an executable program that >> parses input against G. Yacc and other parser generators work this >> way. >> >> There are other ways of parsing input that involve a general-purpose >> parser parsing input against a grammar, where both the input string and >> the grammar are input to the parser and no separate parser is >> constructed from or for the grammar. >> >> Both approaches are imaginable for ixml processors. If I ever get >> around to writing a program that reads an ixml grammar G and translates >> it into Mercury notation, so that I can compile the Mercury program and >> have a parser that reads input matching G and produces XML, then an ixml >> processor built around that will first generate an ixml parser (in the >> sense indicated in your document) and then use it to parse the user's >> input. >> >> But that is not how an Earley parser works. An Earley parser works for >> any grammar; it does not generate a separate parser for each grammar, >> and it is not, itself, generated from any grammar. So under the >> definition proposed, no Earley parser is an ixml parser, even when it is >> used to parse input against an ixml grammar. I don't think that's a >> helpful terminological pattern. >> >> In practice, I expect people's natural inclination will be to treat >> "ixml parser" and "ixml processor" as extensionally equivalent. >> >> Unless we change the spec to talk about parses that cover only part of >> the input, I think the "additional terminology" section may be >> unnecessary. After the discussion last week I am skeptical of the idea >> of making such a change. >> >> I suspect that in discussions of parsing, a term like "partial parse" is >> used to refer to a state of analysis in which part of the input string >> has been analysed (and, in a left-to-right online parsing algorithm, >> consumed) and part remains unanalysed, so that it is not yet clear >> whether the input is or is not a sentence in the language defined by the >> grammar. Using the term to refer instead to a complete parse of a >> prefix of the input seems likely to lead to confusion in the long run. >> Also, the formulations in the additional terminology section seem to >> focus quite narrowly on sentence generation starting with a grammar and >> ending with a sentence, and not on parsing construed as the process of >> starting with a sentence and ending with a parse tree. The definitions >> would feel less procedural to me if they spoke in terms of strings >> being, or not being, sentences in a language, rather than in terms of >> the rewriting of sentential forms. >> >> I hope this helps. >> >> -- >> C. M. Sperberg-McQueen >> Black Mesa Technologies LLC >> http://blackmesatech.com <http://blackmesatech.com/> >> >
Attachments
- text/html attachment: stored
- text/markdown attachment: terminology_220127.md
- text/html attachment: stored
Received on Thursday, 27 January 2022 15:38:49 UTC