Re: what I learned from today's discussion of delimiters from Bethan Tovey-Walsh on 2022-01-27 (public-ixml@w3.org from January 2022)

From: Bethan Tovey-Walsh <accounts@bethan.wales>
Date: Thu, 27 Jan 2022 16:55:03 +0000
To: ixml <public-ixml@w3.org>
Message-Id: <393629EB-AC38-4816-BD62-3C7008A69344@bethan.wales>
Sorry, more from me - I had an epiphany. It may be nonsense.

What if we think of both pragma delimiters and comment delimiters as notifications to the processor? The content of the pragma is addressed to a processor, whereas the content of the comment is addressed to a human. But the *delimiters* in both cases are addressed to the processor, instructing it either a) to ignore a comment, or b) to behave as appropriate for a pragma (which may mean ignoring it, or doing something with it).

If we can accept this, it makes sense to have a basic delimiter meaning “hello processor, this is not a bit of code for you to process as normal”, and one of the following:

a) an extra delimiter saying “this one’s a pragma, don’t ignore it (yet)”;
b) an extra delimiter saying “this one’s a comment, ignore it”;
c) both of the above.

I’d vote for b). If we require an extra delimiter for comments, we limit the possibility of the kind of error I described previously: a typo on the comment delimiter will mean the processor will try to process it as a pragma, and either a) ignore it as a “pragma I can’t process”, or b) raise an error for a badly formed pragma, allowing the author to notice their syntactic error in writing the comment and fix it.

Semantically, we could think of it as a sequence of:

- general delimiter, addressed to the processor
- comment delimiter, instructing the processor that what follows is a comment
- comment
- comment end delimiter(?)
- general end delimiter

This makes sense to me: as Michael noted earlier, pragmas can be all kinds of things; but comments are always just comments (as far as a processor is concerned). We always want the same behaviour when it comes to comments, but not necessarily when it comes to pragmas. So the general delimiter plus extra delimiter says “hey processor, this is not a normal bit of code; however, it’s a thing that you can safely ignore, regardless of circumstances”.

I know this seems like a 180 from my previous position, but I don’t believe it is. I still think that pragmas and comments are different things; but I no longer think that the *delimiters* are part of what makes them different. In fact, I’m not sure that the delimiters form part of a pragma/comment at all.

BTW
____________________________ 
Dr. Bethan Tovey-Walsh 
Myfyrwraig PhD | PhD Student CorCenCC 
Prifysgol Abertawe | Swansea University 
Croeso i chi ysgrifennu ataf yn y Gymraeg.

> On 27 Jan 2022, at 15:59, Bethan Tovey-Walsh <accounts@bethan.wales> wrote:
> 
> I rather like the idea of using ⦃⦄ (U+2983 and U+2984 - white curly brackets), and offering an ASCII two-character alternative; maybe {[ ]} (or {| |}). So ⦃a:b stuff⦄ and {[a:b stuff]} would be equivalent. It’s not strictly the same as the comment delimiter, but it is still a type of curly bracket. And the two-character form looks very similar to the single-character delimiter, so it’ll be easy for a human reader to recognize them as equivalent.
> 
> 
> ___________________________________________________ 
> Dr. Bethan Tovey-Walsh 
> Myfyrwraig PhD | PhD Student CorCenCC 
> Prifysgol Abertawe | Swansea University 
> Croeso i chi ysgrifennu ataf yn y Gymraeg.
> 
>> On 27 Jan 2022, at 15:23, C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com <mailto:cmsmcq@blackmesatech.com>> wrote:
>> 
>> 
>> John Lumley writes:
>> 
>>> At risk of being shot down in flames, there is an ASCII 'bracket' pair
>>> that we aren't currently using, neither of which appears, as far as I 
>>> can see,  in the IXML grammar,
>>> 
>>>   viz: '<' and '>'.
>>> 
>>> Now I know there are other (alright perhaps many) reasons to suggest
>>> avoiding them, but they won't currently appear outside strings in any 
>>> valid IXML and are seen as 'container pairs', and are certainly ASCII.
>> 
>>> Just for sake of some completeness....
>> 
>> You're a brave man, John.
>> 
>> It has been more than 20 years since Java and XML both made Unicode the
>> central character set.  I suspect that by now even { and } are
>> transferred correctly nowadays between IBM mainframes and the rest of
>> the world, although I don't have a convenient way to check.  I think
>> it's time we left seven-bit character sets to the lower-level networking
>> protocols and used Unicode without apology.
>> 
>> I won't object on principle to ASCII delimiters, but I decline to view
>> being in ASCII as an advantage for any delimiter proposal.
>> 
>> In any case, convenience of typing and being in ASCII are not really the
>> same.  They may be roughly the same on U.S. and for the most part on
>> U.K. keyboards, but my recollection is that getting some ASCII
>> characters -- in particular < and > -- was much more complicated on
>> Norwegian keyboards than I had ever imagined.  (Well, not *that*
>> complicated, but I believe it involved both the Alt-Gr key and the shift
>> key as well as a third key.)  In Norway, discussions about raw XML or
>> HTML being easy to type always rang a little hollow.
>> 
>> Any Unicode viewer with a search capacity will show a wide range of
>> possibilities.  Using Richard Ishida's Uniview [1] and searching 'text'
>> for 'bracket' is enlightening.
>> 
>> [1] https://r12a.github.io/uniview/ <https://r12a.github.io/uniview/>
>> 
>> I wonder if we could achieve both (a) a visual echo of the { ... }
>> delimiters we use for comments and (b) a single-character pair, by using
>> one of Unicode's several variants on curly braces:
>> 
>> ⎨⎬
>> 
>> ‎23A8 LEFT CURLY BRACKET MIDDLE PIECE
>> ‎23AC RIGHT CURLY BRACKET MIDDLE PIECE
>> 
>> or ❴❵
>> 
>> ‎2774 MEDIUM LEFT CURLY BRACKET ORNAMENT
>> ‎2775 MEDIUM RIGHT CURLY BRACKET ORNAMENT
>> 
>> or ⦃⦄
>> 
>> ‎2983 LEFT WHITE CURLY BRACKET
>> ‎2984 RIGHT WHITE CURLY BRACKET
>> 
>> or ﹛﹜
>> 
>> ‎FE5B SMALL LEFT CURLY BRACKET
>> ‎FE5C SMALL RIGHT CURLY BRACKET
>> 
>> or ｛｝
>> 
>> ‎FF5B FULLWIDTH LEFT CURLY BRACKET
>> ‎FF5D FULLWIDTH RIGHT CURLY BRACKET
>> 
>> Unfortunately, in my current font some of these display rather poorly.
>> In Richard Ishida's rendering, I quite like U+2983 and U+2984, but they
>> are a bit small in the font I'm looking at right now.  Some of the
>> square bracket and half-bracket pairs (in Uniview, search text for 'half
>> bracket') would perhaps fare better across fonts.
>> 
>> Of course, for the group to accept this idea, there would have to be
>> general acceptance of the view that the choice of delimiters is to be
>> made on aesthetic and psychological grounds (what will a given pair
>> suggest to the human reader?  how will it feel to use these delimiters
>> or those?) because the effect on technical complexity is nil.  I don't
>> know if people are willing to accept that conclusion or not.
>> 
>> Michael
>> 
>> 
>> -- 
>> C. M. Sperberg-McQueen
>> Black Mesa Technologies LLC
>> http://blackmesatech.com <http://blackmesatech.com/>
>> 
>
Received on Thursday, 27 January 2022 16:55:20 UTC