- From: Norm Tovey-Walsh <norm@saxonica.com>
- Date: Fri, 04 Feb 2022 08:37:31 +0000
- To: Dave Pawson <dave.pawson@gmail.com>
- Cc: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>, ixml <public-ixml@w3.org>
- Message-ID: <m2o83n57em.fsf@saxonica.com>
> As Norm says, I've met a confirmed error, should I continue (could I > even continue parsing) > to the end? You can absolutely continue parsing to the end. Suppose you were asked to parse “a*” and the input was “abcde”. Well, “a” matches “a*”, but “ab” doesn’t, and “abc” doesn’t, an “abcd” doesn’t, etc. When you run out of input, you’ll have failed to find a sentence. I don’t think an ixml parser is required to be able to tell that it’s failed to find a sentence anywhere before it reaches the last character. It happens that in my parser, I can tell that “b” didn’t match and by about “b” or “c”, I can tell there will never be a match. > Doesn't sound like a sensible option from the outside? Would a user > be interested? In many > cases the first error compounds later ones etc? It’s not quite that simple, technically. In my previous paragraph, you might ask why can’t I tell we’re done at “b”, what’s this “by about ‘b’ or ‘c’” business? Well, if the parser has made predictions about what might come next (because there were other nonterminals in the grammar). Having failed to find a “b”, it won’t be making any new predictions, but the parser can’t know it’s failed until it’s consumed any other predictions that might have been made. I implemented the slight variation on Earley that was developed by Scott to construct a single graph containing all the possible parses of an input against a grammar. In my parser, if you ask it to recognize “ab” or “abbbbbbbbb”, there’s a weird “dry spell” in the state chart starting at the second “b” and continuing until the predictions for “abbbbbbbbb” succeed or fail. (I believe, in Earley’s original algorithm, each row in the state chart is updated for the successive “b”s in abbb… so if you get an empty row in his chart, your doomed. Scott puts predictions in a separate list so it’s possible to get a sequence of empty rows and then start populating rows again.) > Parse to the end of the input string... unless errors are found? Is > that a reasonable caveat? In parsers of the sort we’re using for ixml, not finding a match isn’t really an error, exactly. It just means your input isn’t a sentence in the grammar. If a parser can determine that it will never succeed at some point before it’s consumed all of the input, then it can return a failure at that point. But that’s a quality of implementation concern, not a conformance one. It’s perfectly reasonable for the parser to consume all the input. Be seeing you, norm -- Norm Tovey-Walsh Saxonica
Received on Friday, 4 February 2022 09:05:24 UTC