Re: issue #24 does an ixml processor have to match everything? from C. M. Sperberg-McQueen on 2022-02-04 (public-ixml@w3.org from February 2022)

From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
Date: Fri, 04 Feb 2022 06:35:29 -0700
To: Dave Pawson <dave.pawson@gmail.com>
Cc: Norm Tovey-Walsh <norm@saxonica.com>, ixml <public-ixml@w3.org>
Message-ID: <87r18ioiu6.fsf@blackmesatech.com>

Dave Pawson writes:

> @MSM - do you mean 'in all circumstances'?
> As Norm says, I've met a confirmed error, should I continue (could I
> even continue parsing)
> to the end?

There is a formal and an informal answer.

Informally, I would say:  no, of course a parser need not waste time
and cycles and memory on i/o of characters that cannot make a difference
to the result.  Once a parser knows that the input is not a sentence in
the language defined by the grammar, it is of course free to return its
result, which is that the input is not a sentence in the grammar (or:
failed to parse, if you prefer).

The question formulated in issue #24 has only ever been about whether a
'successful parse' (however we define that) has to consume / cover /
parse the entire input or not.  I do not understand why the answer "yes"
should suggest to anyone that the answer has any relevance to the case
of non-matching input.

More formally: the current spec (I paraphrase, from memory) describes a
conforming ixml processor as being presented with an input grammar and
an input string (in some form) and doing one of two things:

  - returning an XML representation of a parse treee resulting from
    parsing the string against the grammar

  - informing the user that there is no such parse tree

  - failing for some other reason

There have been proposals to remove the third item, but I don't believe
we have yet discussed them so I hope it's still there.

Nothing in this description, and nothing in the more detailed
description of how ixml and ixml processors work, provides a definition
for what it means to "consume input".

So the formal answer to the question "do you mean that a processor must
consume all the input even in case the input string is not a sentence in
the language?" is the question "how would you even know?"

>    Doesn't sound like a sensible option from the outside? Would a user
> be interested? In many
> cases the first error compounds later ones etc?

It is quite true that in the output from parses which attempt to recover
from parsing errors, a first error is frequently followed by a flood of
other errors (often because the recovery was imperfect).  It is also
true that the 'first error' -- that is, the location where the parser
was first aware that there would be no parse tree -- is sometimes some
distance away from the point at which the user finds the typo.

But the ixml spec says nothing about what kind of error recovery a
processor should perform, or what kind of diagnostic information a
processor is to provide in case the input does not parse.  I think that
is the correct thing for the spec to say.

> Parse to the end of the input string... unless errors are found? Is
> that a reasonable caveat?

Any process that correctly detects that the string is not a sentence in
the language is producing a correct result.  If "parsing to the end of
the input string" means successively placing each character of the input
in a register for examination and doing so even when the correct result
of the calculation is already clear, then yes, but it's an observation
not a caveat.

Michael

-- 
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
http://blackmesatech.com

Received on Friday, 4 February 2022 13:35:52 UTC