Re: what I learned from today's discussion of delimiters from C. M. Sperberg-McQueen on 2022-01-27 (public-ixml@w3.org from January 2022)

From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
Date: Thu, 27 Jan 2022 11:57:22 -0700
To: Bethan Tovey-Walsh <accounts@bethan.wales>
Cc: public-ixml@w3.org
Message-ID: <87y2313t2l.fsf@blackmesatech.com>
Bethan Tovey-Walsh writes:

> What if we think of both pragma delimiters and comment delimiters as
> notifications to the processor? The content of the pragma is addressed
> to a processor, whereas the content of the comment is addressed to a
> human.

At a first approximation, I think that's likely to be true, but it has
been pointed out already that exceptions are possible on both sides of
the line.  If a pragma can be interpreted as a machine-readable
annotation (like many in the discussion document), then the information
it provides can be of interest and use to a human reading the grammar as
well as to software.

And on the other side, even if the language has pragmas there is no way
to prevent my deciding that {!}  is a simpler way of annotating rules in
grammars to mean some particular thing (like 'rewrite this rule' as in
the rule-rewriting example in the discussion document) or elicit some
particular behavior in my processor (even something as banal as
"boldface the left-hand side of this rule in the display").

But yes, I think most of us, and maybe everyone, will agree that there
is some distinction between pragmas and comments which are not pragmas,
though finding words for it that we all agree on may prove tricky.

> But the *delimiters* in both cases are addressed to the
> processor, instructing it either a) to ignore a comment, or b) to
> behave as appropriate for a pragma (which may mean ignoring it, or
> doing something with it).

Okay, I guess, though I very much want to label the notion of particular
characters have addressees as metaphorical not literal.

Any delimiter in a formal grammar can be (is) used by the processor to
know where there are notional boundaries in the input and what the
things on either side of the boundaries are.  In that sense, all
delimiters are addressed to the processor.  And, because they can also
be and are used by human readers for the same purpose, they are also
addressed to the human reader.

> If we can accept this, ...

Warning ... see below.

>                    ... it makes sense to have a basic delimiter
> meaning “hello processor, this is not a bit of code for you to process
> as normal”, and one of the following:

> a) an extra delimiter saying “this one’s a pragma, don’t ignore it (yet)”;
> b) an extra delimiter saying “this one’s a comment, ignore it”;
> c) both of the above.

I am (a) perfectly happy with this paragraph, (a.i) believe it, (a.ii)
disbelieve it, and am (b) moderately concerned about it.

It's (a) phrased as a conditional, and I am (a.i) happy to accept that
conditional: if one accepts the premise, then it does indeed make sense
to interpret multi-character delimiters as described.  At least, some of
the time.  At other times, I think (a.ii) "hang on a bit, that
consequent really doesn't follow from the antecedent".

The more important concern is the fear that (b) it's not quite meant as
a material implication of the kind seen in formal logic, in which p -> q
does not say anything at all about any intrinsic relation between p and
q, only about the possibility that p is true while q is false, so that
if q is true, p can be absolutely any proposition at all.

Concretely, if the line of reasoning you outline leads people to reach
consensus on a choice of delimiters, it's all to the good.  But I don't
think it is necessary that we all agree on the premise, or on the
connection between premise and conclusion.  (And since I don't expect
everyone to agree on the premise, this fact looms large for me.)

Our task as a group is to reach agreement, if we can, on a choice of
delimiters.  Agreement on why it's a good choice is optional and seeking
it may be counter-productive.

> I’d vote for b).  ...

> I know this seems like a 180 from my previous position, but I don’t
> believe it is. I still think that pragmas and comments are different
> things; but I no longer think that the *delimiters* are part of what
> makes them different. In fact, I’m not sure that the delimiters form
> part of a pragma/comment at all.

It depends, of course, on how things are defined.

In the current ixml specification grammar, the delimiters are part of
the comment, just as the delimiters for quoted strings and character
sets are part of the strings recognized by the corresponding
nonterminals.  In XML, the delimiters are, in the grammar, part of the
start-tag, sole-tag, end-tag, general entity reference, parameter entity
reference, or numeric character reference.

But the role played by the delimiters in relation to the comment, the
tag, the element, the entity reference, the entity, etc., is clearly
quite different from that played by the characters that fall within the
delimiters.  For one thing, the delimiters used for one comment are the
same as the delimiters used for another comment and thus convey no
information other than 'this is a comment'.  And when we have other ways
to carry that information (as in the vxml form of a grammar), we don't
bother to retain the delimiters.

So I think I agree with the point I think you are wanting to make,
although I think I disagree with your reasoning.

I also agree that it's not the delimiters that make things different.
In (what I think is) the usual case, we choose different delimiters for
different things not in order to make the things different but to
reflect the fact that they are already different.

Michael


>> On 27 Jan 2022, at 15:59, Bethan Tovey-Walsh <accounts@bethan.wales> wrote:
>> 
>> I rather like the idea of using ⦃⦄ (U+2983 and U+2984 - white curly brackets), and offering an ASCII two-character alternative; maybe {[ ]} (or {| |}). So ⦃a:b stuff⦄ and {[a:b stuff]} would be equivalent. It’s not strictly the same as the comment delimiter, but it is still a type of curly bracket. And the two-character form looks very similar to the single-character delimiter, so it’ll be easy for a human reader to recognize them as equivalent.
>> 
>> 
>> ___________________________________________________ 
>> Dr. Bethan Tovey-Walsh 
>> Myfyrwraig PhD | PhD Student CorCenCC 
>> Prifysgol Abertawe | Swansea University 
>> Croeso i chi ysgrifennu ataf yn y Gymraeg.
>> 
>>> On 27 Jan 2022, at 15:23, C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com <mailto:cmsmcq@blackmesatech.com>> wrote:
>>> 
>>> 
>>> John Lumley writes:
>>> 
>>>> At risk of being shot down in flames, there is an ASCII 'bracket' pair
>>>> that we aren't currently using, neither of which appears, as far as I 
>>>> can see,  in the IXML grammar,
>>>> 
>>>>   viz: '<' and '>'.
>>>> 
>>>> Now I know there are other (alright perhaps many) reasons to suggest
>>>> avoiding them, but they won't currently appear outside strings in any 
>>>> valid IXML and are seen as 'container pairs', and are certainly ASCII.
>>> 
>>>> Just for sake of some completeness....
>>> 
>>> You're a brave man, John.
>>> 
>>> It has been more than 20 years since Java and XML both made Unicode the
>>> central character set.  I suspect that by now even { and } are
>>> transferred correctly nowadays between IBM mainframes and the rest of
>>> the world, although I don't have a convenient way to check.  I think
>>> it's time we left seven-bit character sets to the lower-level networking
>>> protocols and used Unicode without apology.
>>> 
>>> I won't object on principle to ASCII delimiters, but I decline to view
>>> being in ASCII as an advantage for any delimiter proposal.
>>> 
>>> In any case, convenience of typing and being in ASCII are not really the
>>> same.  They may be roughly the same on U.S. and for the most part on
>>> U.K. keyboards, but my recollection is that getting some ASCII
>>> characters -- in particular < and > -- was much more complicated on
>>> Norwegian keyboards than I had ever imagined.  (Well, not *that*
>>> complicated, but I believe it involved both the Alt-Gr key and the shift
>>> key as well as a third key.)  In Norway, discussions about raw XML or
>>> HTML being easy to type always rang a little hollow.
>>> 
>>> Any Unicode viewer with a search capacity will show a wide range of
>>> possibilities.  Using Richard Ishida's Uniview [1] and searching 'text'
>>> for 'bracket' is enlightening.
>>> 
>>> [1] https://r12a.github.io/uniview/ <https://r12a.github.io/uniview/>
>>> 
>>> I wonder if we could achieve both (a) a visual echo of the { ... }
>>> delimiters we use for comments and (b) a single-character pair, by using
>>> one of Unicode's several variants on curly braces:
>>> 
>>> ⎨⎬
>>> 
>>> ‎23A8 LEFT CURLY BRACKET MIDDLE PIECE
>>> ‎23AC RIGHT CURLY BRACKET MIDDLE PIECE
>>> 
>>> or ❴❵
>>> 
>>> ‎2774 MEDIUM LEFT CURLY BRACKET ORNAMENT
>>> ‎2775 MEDIUM RIGHT CURLY BRACKET ORNAMENT
>>> 
>>> or ⦃⦄
>>> 
>>> ‎2983 LEFT WHITE CURLY BRACKET
>>> ‎2984 RIGHT WHITE CURLY BRACKET
>>> 
>>> or ﹛﹜
>>> 
>>> ‎FE5B SMALL LEFT CURLY BRACKET
>>> ‎FE5C SMALL RIGHT CURLY BRACKET
>>> 
>>> or ｛｝
>>> 
>>> ‎FF5B FULLWIDTH LEFT CURLY BRACKET
>>> ‎FF5D FULLWIDTH RIGHT CURLY BRACKET
>>> 
>>> Unfortunately, in my current font some of these display rather poorly.
>>> In Richard Ishida's rendering, I quite like U+2983 and U+2984, but they
>>> are a bit small in the font I'm looking at right now.  Some of the
>>> square bracket and half-bracket pairs (in Uniview, search text for 'half
>>> bracket') would perhaps fare better across fonts.
>>> 
>>> Of course, for the group to accept this idea, there would have to be
>>> general acceptance of the view that the choice of delimiters is to be
>>> made on aesthetic and psychological grounds (what will a given pair
>>> suggest to the human reader?  how will it feel to use these delimiters
>>> or those?) because the effect on technical complexity is nil.  I don't
>>> know if people are willing to accept that conclusion or not.
>>> 
>>> Michael
>>> 
>>> 
>>> -- 
>>> C. M. Sperberg-McQueen
>>> Black Mesa Technologies LLC
>>> http://blackmesatech.com <http://blackmesatech.com/>
>>> 
>> 


-- 
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
http://blackmesatech.com
Received on Thursday, 27 January 2022 18:57:43 UTC