- From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
- Date: Fri, 04 Feb 2022 06:35:29 -0700
- To: Dave Pawson <dave.pawson@gmail.com>
- Cc: Norm Tovey-Walsh <norm@saxonica.com>, ixml <public-ixml@w3.org>
Dave Pawson writes:
> @MSM - do you mean 'in all circumstances'?
> As Norm says, I've met a confirmed error, should I continue (could I
> even continue parsing)
> to the end?
There is a formal and an informal answer.
Informally, I would say: no, of course a parser need not waste time
and cycles and memory on i/o of characters that cannot make a difference
to the result. Once a parser knows that the input is not a sentence in
the language defined by the grammar, it is of course free to return its
result, which is that the input is not a sentence in the grammar (or:
failed to parse, if you prefer).
The question formulated in issue #24 has only ever been about whether a
'successful parse' (however we define that) has to consume / cover /
parse the entire input or not. I do not understand why the answer "yes"
should suggest to anyone that the answer has any relevance to the case
of non-matching input.
More formally: the current spec (I paraphrase, from memory) describes a
conforming ixml processor as being presented with an input grammar and
an input string (in some form) and doing one of two things:
- returning an XML representation of a parse treee resulting from
parsing the string against the grammar
- informing the user that there is no such parse tree
- failing for some other reason
There have been proposals to remove the third item, but I don't believe
we have yet discussed them so I hope it's still there.
Nothing in this description, and nothing in the more detailed
description of how ixml and ixml processors work, provides a definition
for what it means to "consume input".
So the formal answer to the question "do you mean that a processor must
consume all the input even in case the input string is not a sentence in
the language?" is the question "how would you even know?"
> Doesn't sound like a sensible option from the outside? Would a user
> be interested? In many
> cases the first error compounds later ones etc?
It is quite true that in the output from parses which attempt to recover
from parsing errors, a first error is frequently followed by a flood of
other errors (often because the recovery was imperfect). It is also
true that the 'first error' -- that is, the location where the parser
was first aware that there would be no parse tree -- is sometimes some
distance away from the point at which the user finds the typo.
But the ixml spec says nothing about what kind of error recovery a
processor should perform, or what kind of diagnostic information a
processor is to provide in case the input does not parse. I think that
is the correct thing for the spec to say.
> Parse to the end of the input string... unless errors are found? Is
> that a reasonable caveat?
Any process that correctly detects that the string is not a sentence in
the language is producing a correct result. If "parsing to the end of
the input string" means successively placing each character of the input
in a register for examination and doing so even when the correct result
of the calculation is already clear, then yes, but it's an observation
not a caveat.
Michael
--
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
http://blackmesatech.com
Received on Friday, 4 February 2022 13:35:52 UTC