Re: Error definition

So if we have an input *i* and a grammar *g*, a potential error *e* and an output *o*:

• If *i* is a sentence in the grammar *g*, you get *o* as expected (xml nodes in whatever representation)
• if there is a problem in *g*, potentially among other situations, you get *e*
• What should a parser return if everything works correctly, but *i* is not a sentence in the grammar?  How does the parser say "sorry, mate"?

Whilst I understand the distinction that Norm and Michael are making, I'm not sure how a processor should make that situation known to the user/calling function (along with any other information that would be useful downstream such as "I was expecting a ';'): throwing a catchable error seems a pragmatic solution...

Tom

_________________
Tomos Hillman
eXpertML Ltd
+44 7793 242058
On 7 Feb 2022, 07:43 +0000, Norm Tovey-Walsh <norm@saxonica.com>, wrote:
> > My view? It is an error.
> > The input.txt file is 'in error'
> > Hopefully the author of the processor will say @line 25 etc.
>
> One reason we’re struggling with this topic is, perhaps, that we have
> differing views about how a processor might be built.
>
> Let’s look at what a Java compiler does, as a concrete example.
> If you feed this program into a Java compiler:
>
> package com.nwalsh;
>
> public class MyProgram {
> public static void main(String[] args) {
> System.out.println("Hello, world")
> }
> }
>
> it will dutifully report
>
> /tmp/com/nwalsh/MyProgram.java:5: error: ';' expected
> System.out.println("Hello, world")
> ^
>
> This, I assume, would satisfy Dave’s view that the program is “in
> error”.
>
> But there’s more going on here than just what the user sees viewing the
> entire compiler as a black box.
>
> Internally, there’s some code for reading files off disk. If that code
> failed, if the file didn’t exist or had permissions that prevented the
> process from reading it, that would be an exceptional circumstance. The
> code would be unable to fuction and an error would be raised.
>
> Assuming the code was successfully read, it would next be handed to a
> parser that would attempt to turn the characters of the program into an
> abstract syntax tree (AST). The parts of the compiler that come next,
> the optimizer, the byte code generator, etc. don’t want to deal with
> words like “public” or “{“ delimiters. They want an abstraction that’s
> cleared away the syntactic cruft.
>
> The parser is going to report, “Sorry, mate, I couldn’t build an AST. I
> got as far as about the end of line five before I reached a point where
> I couldn’t match the input. You know what, I could have kept going if
> there’d been a “;” there.”
>
> Critically for this discussion, observe that the parser didn’t encounter
> any kind of exceptional circumstance. It wasn’t prevented in anyway from
> completing its function. No error has occurred. The input doesn’t match
> the grammar for Java, but that’s a common and completely expected
> result. (If you don’t think that’s common, just watch me writing Java.)
>
> We aren’t going to make the meaning of the word “error” any clearer or
> more precise by trying to make it do double duty. Asserting that failing
> to find a parse is “an error” reduces the value of the word “error” as a
> technical term.
>
> The *user* can still be told than an error occurred. That’s fine.
>
> But the ixml CG is mostly focused on the part of a larger program that
> turns input grammars into vxml. Given a valid ixml grammar, discovering
> that the input doesn’t match the grammar simply isn’t “an error”. It’s a
> common and completely expected result.
>
> One could imagine a Java compiler that would stick the semicolon in and
> then hand the input back to the parser to try again. It might get
> further this time, until a different grammar matching dead end, or even
> until it succeeds.
>
> That’s just completely different from an I/O error reading the file.
>
> These kinds of subtle distinctions are useful to make specification
> language clear and precise. If you lump everything that could possibly
> go wrong under the term “error”, then “error” becomes less meaningful.
>
> Again, this has nothing directly to do with what the user is told by the
> program they were running.
>
> To take a completely different example, consider
>
> System.out.println(Long.MAX_VALUE + 1);
>
> If the answer -9223372036854775808 surprises you, you might say “that’s
> an error!” But it isn’t. It simply isn’t. It’s a consequence of how
> two’s complement numbers are stored in 64 bit blocks and what happens
> when numeric overflow occurs.
>
> Hoping that helps.
>
> Be seeing you,
> norm
>
> --
> Norm Tovey-Walsh
> Saxonica

Received on Monday, 7 February 2022 10:30:43 UTC