- From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
- Date: Wed, 26 Jan 2022 07:48:34 -0700
- To: Bethan Tovey-Walsh <accounts@bethan.wales>
- Cc: public-ixml@w3.org
Bethan Tovey-Walsh writes: > Attached is an edit of Dave's terminology document. I've proposed some > general revisions, and added some terms that I think/hope others may > find useful. > In particular, I've been finding it hard to work out how to > distinguish between the direct XML output of the ixml processor, > before any extra processing to turn it into a desired output > format. It can't really be an "ixml document", because that risks > confusion with the ixml input. And just calling it the "ixml output" > was becoming extremely frustrating when I wanted to find a way to > distinguish it from the result of post-processing it, e.g. to produce > JSON, or a different flavour of XML, or whatever. In a pipeline with post-processing steps, perhaps the ixml output is not the same as the final output (or as the output of any other step). Since the ixml spec doesn't define any postprocessing, perhaps our terms for the results of other processing can be relatively loose and informal? I see that that is roughly where you ended up, though I think your frustration is audible in the sternness with which you admonish the reader not to refer to downstream output as ixml output. > I look forward to your comments and criticisms when we meet, and thank > you in advance for corrections to any mistakes. Some comments follow. The document says > ## ixml parser > > An *ixml parser* is a parser constructed from an *ixml input grammar*. This seems to reflect assumptions about the internal workings of an ixml processor which I think are not universal and should not be baked into our way of talking about things. A parser is (I think) generally understood to be an executable program. There are plenty of ways of building parsers which take a grammar G as input and produce as output a parser, i.e. an executable program that parses input against G. Yacc and other parser generators work this way. There are other ways of parsing input that involve a general-purpose parser parsing input against a grammar, where both the input string and the grammar are input to the parser and no separate parser is constructed from or for the grammar. Both approaches are imaginable for ixml processors. If I ever get around to writing a program that reads an ixml grammar G and translates it into Mercury notation, so that I can compile the Mercury program and have a parser that reads input matching G and produces XML, then an ixml processor built around that will first generate an ixml parser (in the sense indicated in your document) and then use it to parse the user's input. But that is not how an Earley parser works. An Earley parser works for any grammar; it does not generate a separate parser for each grammar, and it is not, itself, generated from any grammar. So under the definition proposed, no Earley parser is an ixml parser, even when it is used to parse input against an ixml grammar. I don't think that's a helpful terminological pattern. In practice, I expect people's natural inclination will be to treat "ixml parser" and "ixml processor" as extensionally equivalent. Unless we change the spec to talk about parses that cover only part of the input, I think the "additional terminology" section may be unnecessary. After the discussion last week I am skeptical of the idea of making such a change. I suspect that in discussions of parsing, a term like "partial parse" is used to refer to a state of analysis in which part of the input string has been analysed (and, in a left-to-right online parsing algorithm, consumed) and part remains unanalysed, so that it is not yet clear whether the input is or is not a sentence in the language defined by the grammar. Using the term to refer instead to a complete parse of a prefix of the input seems likely to lead to confusion in the long run. Also, the formulations in the additional terminology section seem to focus quite narrowly on sentence generation starting with a grammar and ending with a sentence, and not on parsing construed as the process of starting with a sentence and ending with a parse tree. The definitions would feel less procedural to me if they spoke in terms of strings being, or not being, sentences in a language, rather than in terms of the rewriting of sentential forms. I hope this helps. -- C. M. Sperberg-McQueen Black Mesa Technologies LLC http://blackmesatech.com
Received on Wednesday, 26 January 2022 14:48:58 UTC