Re: issue #24 does an ixml processor have to match everything? from Norm Tovey-Walsh on 2022-02-04 (public-ixml@w3.org from February 2022)

From: Norm Tovey-Walsh <norm@saxonica.com>
Date: Fri, 04 Feb 2022 08:37:31 +0000
To: Dave Pawson <dave.pawson@gmail.com>
Cc: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>, ixml <public-ixml@w3.org>
Message-ID: <m2o83n57em.fsf@saxonica.com>

> As Norm says, I've met a confirmed error, should I continue (could I
> even continue parsing)
> to the end?

You can absolutely continue parsing to the end. Suppose you were asked
to parse “a*” and the input was “abcde”. Well, “a” matches “a*”, but
“ab” doesn’t, and “abc” doesn’t, an “abcd” doesn’t, etc. When you run
out of input, you’ll have failed to find a sentence.

I don’t think an ixml parser is required to be able to tell that it’s
failed to find a sentence anywhere before it reaches the last character.

It happens that in my parser, I can tell that “b” didn’t match and by
about “b” or “c”, I can tell there will never be a match.

>    Doesn't sound like a sensible option from the outside? Would a user
> be interested? In many
> cases the first error compounds later ones etc?

It’s not quite that simple, technically. In my previous paragraph, you
might ask why can’t I tell we’re done at “b”, what’s this “by about ‘b’
or ‘c’” business?

Well, if the parser has made predictions about what might come next
(because there were other nonterminals in the grammar). Having failed to
find a “b”, it won’t be making any new predictions, but the parser can’t
know it’s failed until it’s consumed any other predictions that might
have been made.

I implemented the slight variation on Earley that was developed by Scott
to construct a single graph containing all the possible parses of an
input against a grammar. In my parser, if you ask it to recognize “ab”
or “abbbbbbbbb”, there’s a weird “dry spell” in the state chart starting
at the second “b” and continuing until the predictions for “abbbbbbbbb”
succeed or fail.

(I believe, in Earley’s original algorithm, each row in the state chart
is updated for the successive “b”s in abbb… so if you get an empty row
in his chart, your doomed. Scott puts predictions in a separate list so
it’s possible to get a sequence of empty rows and then start populating
rows again.)

> Parse to the end of the input string... unless errors are found? Is
> that a reasonable caveat?

In parsers of the sort we’re using for ixml, not finding a match isn’t
really an error, exactly. It just means your input isn’t a sentence in
the grammar.

If a parser can determine that it will never succeed at some point
before it’s consumed all of the input, then it can return a failure at
that point. But that’s a quality of implementation concern, not a
conformance one. It’s perfectly reasonable for the parser to consume all
the input.

                                        Be seeing you,
                                          norm

--
Norm Tovey-Walsh
Saxonica

Received on Friday, 4 February 2022 09:05:24 UTC