Re: Error definition from Bethan Tovey-Walsh on 2022-02-07 (public-ixml@w3.org from February 2022)

From: Bethan Tovey-Walsh <accounts@bethan.wales>
Date: Mon, 7 Feb 2022 11:19:25 +0000
To: Tom Hillman <yamahito@gmail.com>
Cc: Dave Pawson <dave.pawson@gmail.com>, Norm Tovey-Walsh <norm@saxonica.com>, "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>, ixml <public-ixml@w3.org>
Message-Id: <8F2F3096-F485-454F-A765-AF2FF2BA4C7D@bethan.wales>
> Whilst I understand the distinction that Norm and Michael are making, I'm not sure how a processor should make that situation known to the user/calling function (along with any other information that would be useful downstream such as "I was expecting a ';'): throwing a catchable error seems a pragmatic solution...

I don’t disagree, but I do think it’s a mistake to make a formal statement that that’s an “error”. I’d be happy to agree to wording saying that processors should report that no derivation was found for an input, but not to call it an error. As I said, the parser has functioned correctly by not producing any output. The user may or may not be expecting vxml output from every input string. As a user who, say, is trying a bunch of inputs to see which ones match, I’d be irritated if the processor kept reporting errors for the non-matching inputs. I’d far rather see “This input couldn’t be parsed using this grammar”. 

For me, it’s like reporting the ambiguous parses. You *could* claim that it’s an “error”, in the general sense of the word - there’s no “right” answer for the parser to produce, and there will certainly be many cases in which a user expects an unambiguous parse. But the parser has done its job correctly, and there’s no point assuming that a user does or does not want an unambiguous result. We just need to say “ambiguous parses must be reported to the user”. Why could we not have a similar statement for an input string that does not parse? 


___________________________________________________ 
Dr. Bethan Tovey-Walsh 
Myfyrwraig PhD | PhD Student CorCenCC 
Prifysgol Abertawe | Swansea University 
Croeso i chi ysgrifennu ataf yn y Gymraeg.

> On 7 Feb 2022, at 10:30, Tomos Hillman <yamahito@gmail.com> wrote:
> 
> So if we have an input *i* and a grammar *g*, a potential error *e* and an output *o*:
> If *i* is a sentence in the grammar *g*, you get *o* as expected (xml nodes in whatever representation)
> if there is a problem in *g*, potentially among other situations, you get *e*
> What should a parser return if everything works correctly, but *i* is not a sentence in the grammar?  How does the parser say "sorry, mate"?
> Whilst I understand the distinction that Norm and Michael are making, I'm not sure how a processor should make that situation known to the user/calling function (along with any other information that would be useful downstream such as "I was expecting a ';'): throwing a catchable error seems a pragmatic solution...
> 
> Tom
> 
> _________________
> Tomos Hillman
> eXpertML Ltd
> +44 7793 242058
> On 7 Feb 2022, 07:43 +0000, Norm Tovey-Walsh <norm@saxonica.com>, wrote:
>>> My view? It is an error.
>>> The input.txt file is 'in error'
>>> Hopefully the author of the processor will say @line 25 etc.
>> 
>> One reason we’re struggling with this topic is, perhaps, that we have
>> differing views about how a processor might be built.
>> 
>> Let’s look at what a Java compiler does, as a concrete example.
>> If you feed this program into a Java compiler:
>> 
>> package com.nwalsh;
>> 
>> public class MyProgram {
>> public static void main(String[] args) {
>> System.out.println("Hello, world")
>> }
>> }
>> 
>> it will dutifully report
>> 
>> /tmp/com/nwalsh/MyProgram.java:5: error: ';' expected
>> System.out.println("Hello, world")
>> ^
>> 
>> This, I assume, would satisfy Dave’s view that the program is “in
>> error”.
>> 
>> But there’s more going on here than just what the user sees viewing the
>> entire compiler as a black box.
>> 
>> Internally, there’s some code for reading files off disk. If that code
>> failed, if the file didn’t exist or had permissions that prevented the
>> process from reading it, that would be an exceptional circumstance. The
>> code would be unable to fuction and an error would be raised.
>> 
>> Assuming the code was successfully read, it would next be handed to a
>> parser that would attempt to turn the characters of the program into an
>> abstract syntax tree (AST). The parts of the compiler that come next,
>> the optimizer, the byte code generator, etc. don’t want to deal with
>> words like “public” or “{“ delimiters. They want an abstraction that’s
>> cleared away the syntactic cruft.
>> 
>> The parser is going to report, “Sorry, mate, I couldn’t build an AST. I
>> got as far as about the end of line five before I reached a point where
>> I couldn’t match the input. You know what, I could have kept going if
>> there’d been a “;” there.”
>> 
>> Critically for this discussion, observe that the parser didn’t encounter
>> any kind of exceptional circumstance. It wasn’t prevented in anyway from
>> completing its function. No error has occurred. The input doesn’t match
>> the grammar for Java, but that’s a common and completely expected
>> result. (If you don’t think that’s common, just watch me writing Java.)
>> 
>> We aren’t going to make the meaning of the word “error” any clearer or
>> more precise by trying to make it do double duty. Asserting that failing
>> to find a parse is “an error” reduces the value of the word “error” as a
>> technical term.
>> 
>> The *user* can still be told than an error occurred. That’s fine.
>> 
>> But the ixml CG is mostly focused on the part of a larger program that
>> turns input grammars into vxml. Given a valid ixml grammar, discovering
>> that the input doesn’t match the grammar simply isn’t “an error”. It’s a
>> common and completely expected result.
>> 
>> One could imagine a Java compiler that would stick the semicolon in and
>> then hand the input back to the parser to try again. It might get
>> further this time, until a different grammar matching dead end, or even
>> until it succeeds.
>> 
>> That’s just completely different from an I/O error reading the file.
>> 
>> These kinds of subtle distinctions are useful to make specification
>> language clear and precise. If you lump everything that could possibly
>> go wrong under the term “error”, then “error” becomes less meaningful.
>> 
>> Again, this has nothing directly to do with what the user is told by the
>> program they were running.
>> 
>> To take a completely different example, consider
>> 
>> System.out.println(Long.MAX_VALUE + 1);
>> 
>> If the answer -9223372036854775808 surprises you, you might say “that’s
>> an error!” But it isn’t. It simply isn’t. It’s a consequence of how
>> two’s complement numbers are stored in 64 bit blocks and what happens
>> when numeric overflow occurs.
>> 
>> Hoping that helps.
>> 
>> Be seeing you,
>> norm
>> 
>> --
>> Norm Tovey-Walsh
>> Saxonica
Received on Monday, 7 February 2022 11:19:43 UTC