Re: Error definition from Bethan Tovey-Walsh on 2022-02-07 (public-ixml@w3.org from February 2022)

From: Bethan Tovey-Walsh <accounts@bethan.wales>
Date: Mon, 7 Feb 2022 12:29:13 +0000
To: Dave Pawson <dave.pawson@gmail.com>
Cc: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>, Norm Tovey-Walsh <norm@saxonica.com>, ixml <public-ixml@w3.org>
Message-Id: <DE8F7773-E2DF-40D2-B74A-25692EBEF3A6@bethan.wales>
>> The spec is only concerned with what a processor has to do in order to behave in a conformant way.
> 
> And (since I now must assume  it is fixed in concrete) the spec is immutable?
> So Steven can go to Prague in June and present, with the CG name behind him?


No, I don’t think Steven should or would do that.

Let me rephrase. The point of *any* programming language specification is to define the syntax and semantics of that language, so that anyone implementing the language knows what they need to do to build a conforming processor, and anyone using the language understands the expected behaviour of code written in the language.

In ixml, the parser is *expected* to return no vxml output if the input string can’t be matched to the input grammar. It’s reasonable, imo, to recommend or even require that there be some way of informing the user that no match was found. It’s *not* reasonable for the spec to define the lack of a match as an “error”. Doing so forces implementors to treat that outcome as an error in order to conform to the spec.

> I do wonder why the word 'error' has brought such a strong reaction?

Apologies if I’m teaching my grandmother to suck eggs, but I think it’s helpful to define what “error” means as a term of art. In programming, “error” can mean one of a few specific things: a syntax error (something in the program doesn’t follow the syntax of the programming language - e.g. a missing parenthesis, a misspelled command); a logic error (e.g. accidentally creating an endless loop, using the wrong operator); a runtime error (e.g. resource leaks).

Your view seems to be that the production of no output by the parser is a logic error. The program produces unexpected output because the programmer made an error. A simple logic error would be if I wrote “a = 5 + 4 / 2” and expected the value of a to be 4.5. In most languages, opperator precedence means the division is evaluated first, so the value of a is 7. What I should have written was “a = (5+4)/2”.

Logic errors don’t cause crashes, so they can be hard to spot and fix. I’d say 99% of the difficult bugs I come across are logic errors (i.e., I was a doofus). They don’t trigger formal error reports, either, because they’re only errors in context: I might have intended for the value of a to be 7, in which case there’s no error in writing “a = 5 + 4 / 2”. 

A language specification is intended to define the syntax and semantics of a language, and the expected behaviour of code written in that language. The specification will make it clear (explicitly or implicitly) that writing “a = (5 + 4 / 2” is an error, because it doesn’t conform to the language’s syntax (parentheses must be balanced). It won’t indicate anything about “a = 5 + 4 / 2”, except that it’s syntactically correct and assigns the value “7” to the variable “a”.

In the unlikely case that logic errors caused by ignoring operator precedence will be a frequent problem for users of the language, the implementor may choose to catch calculations of this sort and ask the user whether they’ve made a mistake ("a = 5 + 4 / 2 may need parentheses for correct functioning: did you mean a = (5 + 4) / 2?” or something of that sort). But that’s a decision for the implementor to make, contextually. 

Trying to anticipate common logic errors, and provide useful feedback about them, is part of being a good programmer. I can imagine a variety of logic errors that an ixml implementor might want to catch: providing an empty input string; providing an input string with trailing whitespace characters; providing a binary file as an input string. But none of these is an error as regards the language itself. ixml doesn’t care if you give it a binary file or an empty input or a string with trailing whitespace - but there’s a good chance that the user didn’t intend to do these things, and will get unexpected output from them.

It’s perhaps reasonable for a language specification to have some additional constraints, beyond those of the language’s syntax. We could, for example, require that implementations inform the user if there is no output from the parser. Calling that result an error, though, would be confusing because it would imply that it violates the syntax of ixml, which it doesn’t. Unless we actually change the language and require that an ixml input string be a valid sentence of the corresponding ixml input grammar, the lack of a match isn’t a syntax error. It makes a lot of sense, imo, for the spec to reserve the word “error” for cases where some structure of the language itself has been violated, and to refer to unexpected/undesired outputs of a properly functioning ixml processor in some other way.

I’m not speaking for others, but that’s why I’m opposed to the term “error” being used in the spec to refer to the class of logic errors.



___________________________________________________ 
Dr. Bethan Tovey-Walsh 
Myfyrwraig PhD | PhD Student CorCenCC 
Prifysgol Abertawe | Swansea University 
Croeso i chi ysgrifennu ataf yn y Gymraeg.

> On 7 Feb 2022, at 07:43, Dave Pawson <dave.pawson@gmail.com> wrote:
> 
> On Sun, 6 Feb 2022 at 20:27, Bethan Tovey-Walsh <accounts@bethan.wales> wrote:
>> 
>> Dave, I think what Michael might be getting at is this:
>> 
>> The spec isn’t involved with details of how an implementor communicates with a user. The spec isn’t concerned with details of how an implementor chooses to implement the grammar. The spec is only concerned with what a processor has to do in order to behave in a conformant way.
> 
> And (since I now must assume  it is fixed in concrete) the spec is immutable?
> So Steven can go to Prague in June and present, with the CG name behind him?
> 
> I thought otherwise.
> 
> I do wonder why the word 'error' has brought such a strong reaction?
> 
> Bye.
> 
> 
>> 
>> So, for example, Michael’s and Norm’s and Steven’s and Tom’s implementations will all differ in various ways. They’ll have different interfaces, different ways for users to specify input files, and so on. Users can expect them to produce the same result from a given input grammar and input string if they’re all conforming to the spec. But there’s no expectation that they’ll behave identically while producing that result, either behind the scenes or in communication with the user.
>> 
>> Any implementation worth its bytes is going to have ways to communicate things like “your input string won’t parse against your grammar”, or “you’ve given me a directory name as an input, but I can only accept a filename”. But deciding the exact details of how and when to communicate those things is implementation-dependent because they don’t relate to the correct functioning of the language itself.
>> 
>> Let’s say an ixml implementation allows you to specify multiple input files to the processor. You as the user might know quite well that only some files will match against the grammar, so you just need a note of which ones succeeded and which didn’t. It would make sense to report that some files didn’t produce output, without calling that an error and stopping the whole process.
>> 
>> The point is, the ixml parser has completed its job successfully, regardless of whether it outputs vxml or not - because its job is to parse the input string against the input grammar and to output vxml *if a valid parse is found*. It would be in error if it a) failed to output vxml for a valid input string, or b) output vxml for an invalid input string. Tom would be daft not to communicate the matching and non-matching inputs to the user, but not because the non-matching inputs are in any sense “errors”.
>> 
>> I don’t think Michael’s implying that implementors shouldn’t communicate clearly about whether the input string was valid against the input grammar; only that it isn’t an error of the ixml language if the input string isn’t valid against the grammar.
>> 
>> 
>>> 
>>> On 6 Feb 2022, at 14:55, Dave Pawson <dave.pawson@gmail.com> wrote:
>>> 
>>> On Sun, 6 Feb 2022 at 14:15, C. M. Sperberg-McQueen
>>> <cmsmcq@blackmesatech.com> wrote:
>>>> 
>>>> 
>>>> Dave Pawson writes:
>>>> 
>>>>> On Sat, 5 Feb 2022 at 17:26, C. M. Sperberg-McQueen
>>>>> <cmsmcq@blackmesatech.com> wrote:
>>>> 
>>>>> IMHO a bug in the processor does not give me 3, hence it is an error.
>>> 
>>>> On your view, in the situation you describe, who made the mistake? Who
>>>> committed the error? What rule did they violate?
>>> 
>>> The processor author. I then rely on them to inform the user that 'something
>>> went wrong'. Still an error ( in the users view). Why? Because I did
>>> not get output.xml
>>> 
>>> 
>>> 
>>> 
>>> 
>>>> If a processor fails to produce an XML parse tree for the input and
>>>> instead produces diagnostic information saying something like "this
>>>> input does not match the grammar; further details below ...", does that
>>>> suffice for your purposes?  Or is it necessary that the word "error" be
>>>> used in the message?
>>> 
>>> My view? It is an error.
>>> The input.txt file is 'in error'
>>> Hopefully the author of the processor will say @line 25 etc.
>>> 
>>> 
>>>> 
>>>> Is it necessary for your goals with respect to ixml that the spec use
>>>> the word "error" to describe the situation in which you do not get your
>>>> expected output?
>>> 
>>> My view? It would be helpful to an end user.
>>> Equally, classes of error (see XSLT rec) to assist debugging user code.
>>> 
>>> 
>>>> 
>>>> Do problems arise if the word "error" is not used in the spec when
>>>> describing that situation?
>>> 
>>> Problems of clarity?
>>> 
>>> 
>>> 
>>>>> Do you wish to build a playground for devs only?
>>>> 
>>>> Not particularly.  I would like a playground that is open to all and not
>>>> marked as closed off to me.
>>> 
>>> I would hope that your debug code is removed / not executing by
>>> the time the user has it?
>>> 
>>> 
>>> regards
>>> 
>>> --
>>> Dave Pawson
>>> XSLT XSL-FO FAQ.
>>> Docbook FAQ.
>> 
> 
> 
> -- 
> Dave Pawson
> XSLT XSL-FO FAQ.
> Docbook FAQ.
Received on Monday, 7 February 2022 12:29:34 UTC