- From: Tomos Hillman <yamahito@gmail.com>
- Date: Mon, 7 Feb 2022 10:30:23 +0000
- To: Dave Pawson <dave.pawson@gmail.com>, Norm Tovey-Walsh <norm@saxonica.com>
- Cc: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>, Bethan Tovey-Walsh <accounts@bethan.wales>, ixml <public-ixml@w3.org>
- Message-ID: <28444fd6-56ac-4536-b6a4-2b32367fcb5d@Spark>
So if we have an input *i* and a grammar *g*, a potential error *e* and an output *o*: • If *i* is a sentence in the grammar *g*, you get *o* as expected (xml nodes in whatever representation) • if there is a problem in *g*, potentially among other situations, you get *e* • What should a parser return if everything works correctly, but *i* is not a sentence in the grammar? How does the parser say "sorry, mate"? Whilst I understand the distinction that Norm and Michael are making, I'm not sure how a processor should make that situation known to the user/calling function (along with any other information that would be useful downstream such as "I was expecting a ';'): throwing a catchable error seems a pragmatic solution... Tom _________________ Tomos Hillman eXpertML Ltd +44 7793 242058 On 7 Feb 2022, 07:43 +0000, Norm Tovey-Walsh <norm@saxonica.com>, wrote: > > My view? It is an error. > > The input.txt file is 'in error' > > Hopefully the author of the processor will say @line 25 etc. > > One reason we’re struggling with this topic is, perhaps, that we have > differing views about how a processor might be built. > > Let’s look at what a Java compiler does, as a concrete example. > If you feed this program into a Java compiler: > > package com.nwalsh; > > public class MyProgram { > public static void main(String[] args) { > System.out.println("Hello, world") > } > } > > it will dutifully report > > /tmp/com/nwalsh/MyProgram.java:5: error: ';' expected > System.out.println("Hello, world") > ^ > > This, I assume, would satisfy Dave’s view that the program is “in > error”. > > But there’s more going on here than just what the user sees viewing the > entire compiler as a black box. > > Internally, there’s some code for reading files off disk. If that code > failed, if the file didn’t exist or had permissions that prevented the > process from reading it, that would be an exceptional circumstance. The > code would be unable to fuction and an error would be raised. > > Assuming the code was successfully read, it would next be handed to a > parser that would attempt to turn the characters of the program into an > abstract syntax tree (AST). The parts of the compiler that come next, > the optimizer, the byte code generator, etc. don’t want to deal with > words like “public” or “{“ delimiters. They want an abstraction that’s > cleared away the syntactic cruft. > > The parser is going to report, “Sorry, mate, I couldn’t build an AST. I > got as far as about the end of line five before I reached a point where > I couldn’t match the input. You know what, I could have kept going if > there’d been a “;” there.” > > Critically for this discussion, observe that the parser didn’t encounter > any kind of exceptional circumstance. It wasn’t prevented in anyway from > completing its function. No error has occurred. The input doesn’t match > the grammar for Java, but that’s a common and completely expected > result. (If you don’t think that’s common, just watch me writing Java.) > > We aren’t going to make the meaning of the word “error” any clearer or > more precise by trying to make it do double duty. Asserting that failing > to find a parse is “an error” reduces the value of the word “error” as a > technical term. > > The *user* can still be told than an error occurred. That’s fine. > > But the ixml CG is mostly focused on the part of a larger program that > turns input grammars into vxml. Given a valid ixml grammar, discovering > that the input doesn’t match the grammar simply isn’t “an error”. It’s a > common and completely expected result. > > One could imagine a Java compiler that would stick the semicolon in and > then hand the input back to the parser to try again. It might get > further this time, until a different grammar matching dead end, or even > until it succeeds. > > That’s just completely different from an I/O error reading the file. > > These kinds of subtle distinctions are useful to make specification > language clear and precise. If you lump everything that could possibly > go wrong under the term “error”, then “error” becomes less meaningful. > > Again, this has nothing directly to do with what the user is told by the > program they were running. > > To take a completely different example, consider > > System.out.println(Long.MAX_VALUE + 1); > > If the answer -9223372036854775808 surprises you, you might say “that’s > an error!” But it isn’t. It simply isn’t. It’s a consequence of how > two’s complement numbers are stored in 64 bit blocks and what happens > when numeric overflow occurs. > > Hoping that helps. > > Be seeing you, > norm > > -- > Norm Tovey-Walsh > Saxonica
Received on Monday, 7 February 2022 10:30:43 UTC