- From: Norm Tovey-Walsh <norm@saxonica.com>
- Date: Mon, 07 Feb 2022 07:04:41 +0000
- To: Dave Pawson <dave.pawson@gmail.com>
- Cc: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>, Bethan Tovey-Walsh <accounts@bethan.wales>, ixml <public-ixml@w3.org>
- Message-ID: <m21r0fnmuh.fsf@saxonica.com>
> My view? It is an error. > The input.txt file is 'in error' > Hopefully the author of the processor will say @line 25 etc. One reason we’re struggling with this topic is, perhaps, that we have differing views about how a processor might be built. Let’s look at what a Java compiler does, as a concrete example. If you feed this program into a Java compiler: package com.nwalsh; public class MyProgram { public static void main(String[] args) { System.out.println("Hello, world") } } it will dutifully report /tmp/com/nwalsh/MyProgram.java:5: error: ';' expected System.out.println("Hello, world") ^ This, I assume, would satisfy Dave’s view that the program is “in error”. But there’s more going on here than just what the user sees viewing the entire compiler as a black box. Internally, there’s some code for reading files off disk. If that code failed, if the file didn’t exist or had permissions that prevented the process from reading it, that would be an exceptional circumstance. The code would be unable to fuction and an error would be raised. Assuming the code was successfully read, it would next be handed to a parser that would attempt to turn the characters of the program into an abstract syntax tree (AST). The parts of the compiler that come next, the optimizer, the byte code generator, etc. don’t want to deal with words like “public” or “{“ delimiters. They want an abstraction that’s cleared away the syntactic cruft. The parser is going to report, “Sorry, mate, I couldn’t build an AST. I got as far as about the end of line five before I reached a point where I couldn’t match the input. You know what, I could have kept going if there’d been a “;” there.” Critically for this discussion, observe that the parser didn’t encounter any kind of exceptional circumstance. It wasn’t prevented in anyway from completing its function. No error has occurred. The input doesn’t match the grammar for Java, but that’s a common and completely expected result. (If you don’t think that’s common, just watch me writing Java.) We aren’t going to make the meaning of the word “error” any clearer or more precise by trying to make it do double duty. Asserting that failing to find a parse is “an error” reduces the value of the word “error” as a technical term. The *user* can still be told than an error occurred. That’s fine. But the ixml CG is mostly focused on the part of a larger program that turns input grammars into vxml. Given a valid ixml grammar, discovering that the input doesn’t match the grammar simply isn’t “an error”. It’s a common and completely expected result. One could imagine a Java compiler that would stick the semicolon in and then hand the input back to the parser to try again. It might get further this time, until a different grammar matching dead end, or even until it succeeds. That’s just completely different from an I/O error reading the file. These kinds of subtle distinctions are useful to make specification language clear and precise. If you lump everything that could possibly go wrong under the term “error”, then “error” becomes less meaningful. Again, this has nothing directly to do with what the user is told by the program they were running. To take a completely different example, consider System.out.println(Long.MAX_VALUE + 1); If the answer -9223372036854775808 surprises you, you might say “that’s an error!” But it isn’t. It simply isn’t. It’s a consequence of how two’s complement numbers are stored in 64 bit blocks and what happens when numeric overflow occurs. Hoping that helps. Be seeing you, norm -- Norm Tovey-Walsh Saxonica
Received on Monday, 7 February 2022 07:43:39 UTC