Re: what I learned from today's discussion of delimiters from Bethan Tovey-Walsh on 2022-01-27 (public-ixml@w3.org from January 2022)

From: Bethan Tovey-Walsh <accounts@bethan.wales>
Date: Thu, 27 Jan 2022 20:12:22 +0000
To: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>
Cc: ixml <public-ixml@w3.org>
Message-Id: <E1B59799-EE92-4513-A1F7-C30F543E27A2@bethan.wales>
> Okay, I guess, though I very much want to label the notion of particular
> characters have addressees as metaphorical not literal.
> 
> Any delimiter in a formal grammar can be (is) used by the processor to
> know where there are notional boundaries in the input and what the
> things on either side of the boundaries are.  In that sense, all
> delimiters are addressed to the processor.  And, because they can also
> be and are used by human readers for the same purpose, they are also
> addressed to the human reader.

Yes, okay; I see what you mean. But I believe that the requirements of the human reader are incidental. If I wrote an ixml grammar, and wrote on one line:

(ignore this next production if you don’t want to output the “date” element)

or

— N.B. this next one can be ignored if you don’t want a “date” element

or

** ignore subsequent line if date element not required **

a human reader should have no problem interpreting any of those lines as I intended them to. A human reader would probably accept that any line not having the form of a syntactically correct production rule, and particularly any such line containing free prose (as in the examples) is not intended to be a part of the grammar, and (further) may contain a useful instruction to the human reader. But a processor would (should) fall over. We can’t rely on human-style interpretation of context and content when dealing with a processor. 

Conversely, we can expect a processor to behave reliably if we “tell” it the syntax for a comment/pragma. We can’t expect the same from a human, who might decide to ignore instructions, or fail to understand them, or skip over one unintentionally.

The syntax, delimiters, etc. of formal languages are as much for humans as for processors, because they're a way to express something very precise with zero (or zero-adjacent; humans are weird) ambiguity. Natural language can’t be as reliably precise as a formal language, so it’s a bad fit for expressing e.g. the rules of a formal grammar.

But I’d argue that the same kind of syntactic precision isn’t essential for comments, and that a human would have as little problem correctly interpreting any of the examples I gave as they would a well-formed ixml comment.

So I would argue that, when we delimit comments *in the specific way we do* in programming, we’re doing that for the processor.

In any case, maybe this is a little abstruse.

> I also agree that it's not the delimiters that make things different.
> In (what I think is) the usual case, we choose different delimiters for
> different things not in order to make the things different but to
> reflect the fact that they are already different.

Yes, this does make sense. I’m now having the experience of believing two incompatible things at the same time. It’s like one of those Magic Eye pictures.

Ah well, maybe this was an unproductive rabbit hole.

BTW


___________________________________________________ 
Dr. Bethan Tovey-Walsh 
Myfyrwraig PhD | PhD Student CorCenCC 
Prifysgol Abertawe | Swansea University 
Croeso i chi ysgrifennu ataf yn y Gymraeg.

> On 27 Jan 2022, at 18:57, C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com> wrote:
> 
> 
> Bethan Tovey-Walsh writes:
> 
>> What if we think of both pragma delimiters and comment delimiters as
>> notifications to the processor? The content of the pragma is addressed
>> to a processor, whereas the content of the comment is addressed to a
>> human.
> 
> At a first approximation, I think that's likely to be true, but it has
> been pointed out already that exceptions are possible on both sides of
> the line.  If a pragma can be interpreted as a machine-readable
> annotation (like many in the discussion document), then the information
> it provides can be of interest and use to a human reading the grammar as
> well as to software.
> 
> And on the other side, even if the language has pragmas there is no way
> to prevent my deciding that {!}  is a simpler way of annotating rules in
> grammars to mean some particular thing (like 'rewrite this rule' as in
> the rule-rewriting example in the discussion document) or elicit some
> particular behavior in my processor (even something as banal as
> "boldface the left-hand side of this rule in the display").
> 
> But yes, I think most of us, and maybe everyone, will agree that there
> is some distinction between pragmas and comments which are not pragmas,
> though finding words for it that we all agree on may prove tricky.
> 
>> But the *delimiters* in both cases are addressed to the
>> processor, instructing it either a) to ignore a comment, or b) to
>> behave as appropriate for a pragma (which may mean ignoring it, or
>> doing something with it).
> 
> Okay, I guess, though I very much want to label the notion of particular
> characters have addressees as metaphorical not literal.
> 
> Any delimiter in a formal grammar can be (is) used by the processor to
> know where there are notional boundaries in the input and what the
> things on either side of the boundaries are.  In that sense, all
> delimiters are addressed to the processor.  And, because they can also
> be and are used by human readers for the same purpose, they are also
> addressed to the human reader.
> 
>> If we can accept this, ...
> 
> Warning ... see below.
> 
>>                   ... it makes sense to have a basic delimiter
>> meaning “hello processor, this is not a bit of code for you to process
>> as normal”, and one of the following:
> 
>> a) an extra delimiter saying “this one’s a pragma, don’t ignore it (yet)”;
>> b) an extra delimiter saying “this one’s a comment, ignore it”;
>> c) both of the above.
> 
> I am (a) perfectly happy with this paragraph, (a.i) believe it, (a.ii)
> disbelieve it, and am (b) moderately concerned about it.
> 
> It's (a) phrased as a conditional, and I am (a.i) happy to accept that
> conditional: if one accepts the premise, then it does indeed make sense
> to interpret multi-character delimiters as described.  At least, some of
> the time.  At other times, I think (a.ii) "hang on a bit, that
> consequent really doesn't follow from the antecedent".
> 
> The more important concern is the fear that (b) it's not quite meant as
> a material implication of the kind seen in formal logic, in which p -> q
> does not say anything at all about any intrinsic relation between p and
> q, only about the possibility that p is true while q is false, so that
> if q is true, p can be absolutely any proposition at all.
> 
> Concretely, if the line of reasoning you outline leads people to reach
> consensus on a choice of delimiters, it's all to the good.  But I don't
> think it is necessary that we all agree on the premise, or on the
> connection between premise and conclusion.  (And since I don't expect
> everyone to agree on the premise, this fact looms large for me.)
> 
> Our task as a group is to reach agreement, if we can, on a choice of
> delimiters.  Agreement on why it's a good choice is optional and seeking
> it may be counter-productive.
> 
>> I’d vote for b).  ...
> 
>> I know this seems like a 180 from my previous position, but I don’t
>> believe it is. I still think that pragmas and comments are different
>> things; but I no longer think that the *delimiters* are part of what
>> makes them different. In fact, I’m not sure that the delimiters form
>> part of a pragma/comment at all.
> 
> It depends, of course, on how things are defined.
> 
> In the current ixml specification grammar, the delimiters are part of
> the comment, just as the delimiters for quoted strings and character
> sets are part of the strings recognized by the corresponding
> nonterminals.  In XML, the delimiters are, in the grammar, part of the
> start-tag, sole-tag, end-tag, general entity reference, parameter entity
> reference, or numeric character reference.
> 
> But the role played by the delimiters in relation to the comment, the
> tag, the element, the entity reference, the entity, etc., is clearly
> quite different from that played by the characters that fall within the
> delimiters.  For one thing, the delimiters used for one comment are the
> same as the delimiters used for another comment and thus convey no
> information other than 'this is a comment'.  And when we have other ways
> to carry that information (as in the vxml form of a grammar), we don't
> bother to retain the delimiters.
> 
> So I think I agree with the point I think you are wanting to make,
> although I think I disagree with your reasoning.
> 
> I also agree that it's not the delimiters that make things different.
> In (what I think is) the usual case, we choose different delimiters for
> different things not in order to make the things different but to
> reflect the fact that they are already different.
> 
> Michael
> 
> 
>>> On 27 Jan 2022, at 15:59, Bethan Tovey-Walsh <accounts@bethan.wales> wrote:
>>> 
>>> I rather like the idea of using ⦃⦄ (U+2983 and U+2984 - white curly brackets), and offering an ASCII two-character alternative; maybe {[ ]} (or {| |}). So ⦃a:b stuff⦄ and {[a:b stuff]} would be equivalent. It’s not strictly the same as the comment delimiter, but it is still a type of curly bracket. And the two-character form looks very similar to the single-character delimiter, so it’ll be easy for a human reader to recognize them as equivalent.
>>> 
>>> 
>>> ___________________________________________________ 
>>> Dr. Bethan Tovey-Walsh 
>>> Myfyrwraig PhD | PhD Student CorCenCC 
>>> Prifysgol Abertawe | Swansea University 
>>> Croeso i chi ysgrifennu ataf yn y Gymraeg.
>>> 
>>>> On 27 Jan 2022, at 15:23, C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com <mailto:cmsmcq@blackmesatech.com>> wrote:
>>>> 
>>>> 
>>>> John Lumley writes:
>>>> 
>>>>> At risk of being shot down in flames, there is an ASCII 'bracket' pair
>>>>> that we aren't currently using, neither of which appears, as far as I 
>>>>> can see,  in the IXML grammar,
>>>>> 
>>>>>  viz: '<' and '>'.
>>>>> 
>>>>> Now I know there are other (alright perhaps many) reasons to suggest
>>>>> avoiding them, but they won't currently appear outside strings in any 
>>>>> valid IXML and are seen as 'container pairs', and are certainly ASCII.
>>>> 
>>>>> Just for sake of some completeness....
>>>> 
>>>> You're a brave man, John.
>>>> 
>>>> It has been more than 20 years since Java and XML both made Unicode the
>>>> central character set.  I suspect that by now even { and } are
>>>> transferred correctly nowadays between IBM mainframes and the rest of
>>>> the world, although I don't have a convenient way to check.  I think
>>>> it's time we left seven-bit character sets to the lower-level networking
>>>> protocols and used Unicode without apology.
>>>> 
>>>> I won't object on principle to ASCII delimiters, but I decline to view
>>>> being in ASCII as an advantage for any delimiter proposal.
>>>> 
>>>> In any case, convenience of typing and being in ASCII are not really the
>>>> same.  They may be roughly the same on U.S. and for the most part on
>>>> U.K. keyboards, but my recollection is that getting some ASCII
>>>> characters -- in particular < and > -- was much more complicated on
>>>> Norwegian keyboards than I had ever imagined.  (Well, not *that*
>>>> complicated, but I believe it involved both the Alt-Gr key and the shift
>>>> key as well as a third key.)  In Norway, discussions about raw XML or
>>>> HTML being easy to type always rang a little hollow.
>>>> 
>>>> Any Unicode viewer with a search capacity will show a wide range of
>>>> possibilities.  Using Richard Ishida's Uniview [1] and searching 'text'
>>>> for 'bracket' is enlightening.
>>>> 
>>>> [1] https://r12a.github.io/uniview/ <https://r12a.github.io/uniview/>
>>>> 
>>>> I wonder if we could achieve both (a) a visual echo of the { ... }
>>>> delimiters we use for comments and (b) a single-character pair, by using
>>>> one of Unicode's several variants on curly braces:
>>>> 
>>>> ⎨⎬
>>>> 
>>>> ‎23A8 LEFT CURLY BRACKET MIDDLE PIECE
>>>> ‎23AC RIGHT CURLY BRACKET MIDDLE PIECE
>>>> 
>>>> or ❴❵
>>>> 
>>>> ‎2774 MEDIUM LEFT CURLY BRACKET ORNAMENT
>>>> ‎2775 MEDIUM RIGHT CURLY BRACKET ORNAMENT
>>>> 
>>>> or ⦃⦄
>>>> 
>>>> ‎2983 LEFT WHITE CURLY BRACKET
>>>> ‎2984 RIGHT WHITE CURLY BRACKET
>>>> 
>>>> or ﹛﹜
>>>> 
>>>> ‎FE5B SMALL LEFT CURLY BRACKET
>>>> ‎FE5C SMALL RIGHT CURLY BRACKET
>>>> 
>>>> or ｛｝
>>>> 
>>>> ‎FF5B FULLWIDTH LEFT CURLY BRACKET
>>>> ‎FF5D FULLWIDTH RIGHT CURLY BRACKET
>>>> 
>>>> Unfortunately, in my current font some of these display rather poorly.
>>>> In Richard Ishida's rendering, I quite like U+2983 and U+2984, but they
>>>> are a bit small in the font I'm looking at right now.  Some of the
>>>> square bracket and half-bracket pairs (in Uniview, search text for 'half
>>>> bracket') would perhaps fare better across fonts.
>>>> 
>>>> Of course, for the group to accept this idea, there would have to be
>>>> general acceptance of the view that the choice of delimiters is to be
>>>> made on aesthetic and psychological grounds (what will a given pair
>>>> suggest to the human reader?  how will it feel to use these delimiters
>>>> or those?) because the effect on technical complexity is nil.  I don't
>>>> know if people are willing to accept that conclusion or not.
>>>> 
>>>> Michael
>>>> 
>>>> 
>>>> -- 
>>>> C. M. Sperberg-McQueen
>>>> Black Mesa Technologies LLC
>>>> http://blackmesatech.com <http://blackmesatech.com/>
>>>> 
>>> 
> 
> 
> -- 
> C. M. Sperberg-McQueen
> Black Mesa Technologies LLC
> http://blackmesatech.com
Received on Thursday, 27 January 2022 20:12:41 UTC