Re: what I learned from today's discussion of delimiters from C. M. Sperberg-McQueen on 2022-01-27 (public-ixml@w3.org from January 2022)

From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
Date: Thu, 27 Jan 2022 08:23:42 -0700
To: John Lumley <john@saxonica.com>
Cc: public-ixml@w3.org
Message-ID: <87czkd5hj5.fsf@blackmesatech.com>

John Lumley writes:

> At risk of being shot down in flames, there is an ASCII 'bracket' pair
> that we aren't currently using, neither of which appears, as far as I 
> can see,  in the IXML grammar,
>
>    viz: '<' and '>'.
>
> Now I know there are other (alright perhaps many) reasons to suggest
> avoiding them, but they won't currently appear outside strings in any 
> valid IXML and are seen as 'container pairs', and are certainly ASCII.

> Just for sake of some completeness....

You're a brave man, John.

It has been more than 20 years since Java and XML both made Unicode the
central character set.  I suspect that by now even { and } are
transferred correctly nowadays between IBM mainframes and the rest of
the world, although I don't have a convenient way to check.  I think
it's time we left seven-bit character sets to the lower-level networking
protocols and used Unicode without apology.

I won't object on principle to ASCII delimiters, but I decline to view
being in ASCII as an advantage for any delimiter proposal.

In any case, convenience of typing and being in ASCII are not really the
same.  They may be roughly the same on U.S. and for the most part on
U.K. keyboards, but my recollection is that getting some ASCII
characters -- in particular < and > -- was much more complicated on
Norwegian keyboards than I had ever imagined.  (Well, not *that*
complicated, but I believe it involved both the Alt-Gr key and the shift
key as well as a third key.)  In Norway, discussions about raw XML or
HTML being easy to type always rang a little hollow.

Any Unicode viewer with a search capacity will show a wide range of
possibilities.  Using Richard Ishida's Uniview [1] and searching 'text'
for 'bracket' is enlightening.

[1] https://r12a.github.io/uniview/

I wonder if we could achieve both (a) a visual echo of the { ... }
delimiters we use for comments and (b) a single-character pair, by using
one of Unicode's several variants on curly braces:

 ⎨⎬

 ‎23A8 LEFT CURLY BRACKET MIDDLE PIECE
 ‎23AC RIGHT CURLY BRACKET MIDDLE PIECE

or ❴❵

 ‎2774 MEDIUM LEFT CURLY BRACKET ORNAMENT
 ‎2775 MEDIUM RIGHT CURLY BRACKET ORNAMENT

or ⦃⦄

 ‎2983 LEFT WHITE CURLY BRACKET
 ‎2984 RIGHT WHITE CURLY BRACKET

or ﹛﹜

 ‎FE5B SMALL LEFT CURLY BRACKET
 ‎FE5C SMALL RIGHT CURLY BRACKET

or ｛｝

 ‎FF5B FULLWIDTH LEFT CURLY BRACKET
 ‎FF5D FULLWIDTH RIGHT CURLY BRACKET

Unfortunately, in my current font some of these display rather poorly.
In Richard Ishida's rendering, I quite like U+2983 and U+2984, but they
are a bit small in the font I'm looking at right now.  Some of the
square bracket and half-bracket pairs (in Uniview, search text for 'half
bracket') would perhaps fare better across fonts.

Of course, for the group to accept this idea, there would have to be
general acceptance of the view that the choice of delimiters is to be
made on aesthetic and psychological grounds (what will a given pair
suggest to the human reader?  how will it feel to use these delimiters
or those?) because the effect on technical complexity is nil.  I don't
know if people are willing to accept that conclusion or not.

Michael

-- 
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
http://blackmesatech.com

Received on Thursday, 27 January 2022 15:24:02 UTC