- From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
- Date: Thu, 27 Jan 2022 08:23:42 -0700
- To: John Lumley <john@saxonica.com>
- Cc: public-ixml@w3.org
John Lumley writes: > At risk of being shot down in flames, there is an ASCII 'bracket' pair > that we aren't currently using, neither of which appears, as far as I > can see, in the IXML grammar, > > viz: '<' and '>'. > > Now I know there are other (alright perhaps many) reasons to suggest > avoiding them, but they won't currently appear outside strings in any > valid IXML and are seen as 'container pairs', and are certainly ASCII. > Just for sake of some completeness.... You're a brave man, John. It has been more than 20 years since Java and XML both made Unicode the central character set. I suspect that by now even { and } are transferred correctly nowadays between IBM mainframes and the rest of the world, although I don't have a convenient way to check. I think it's time we left seven-bit character sets to the lower-level networking protocols and used Unicode without apology. I won't object on principle to ASCII delimiters, but I decline to view being in ASCII as an advantage for any delimiter proposal. In any case, convenience of typing and being in ASCII are not really the same. They may be roughly the same on U.S. and for the most part on U.K. keyboards, but my recollection is that getting some ASCII characters -- in particular < and > -- was much more complicated on Norwegian keyboards than I had ever imagined. (Well, not *that* complicated, but I believe it involved both the Alt-Gr key and the shift key as well as a third key.) In Norway, discussions about raw XML or HTML being easy to type always rang a little hollow. Any Unicode viewer with a search capacity will show a wide range of possibilities. Using Richard Ishida's Uniview [1] and searching 'text' for 'bracket' is enlightening. [1] https://r12a.github.io/uniview/ I wonder if we could achieve both (a) a visual echo of the { ... } delimiters we use for comments and (b) a single-character pair, by using one of Unicode's several variants on curly braces: ⎨⎬ 23A8 LEFT CURLY BRACKET MIDDLE PIECE 23AC RIGHT CURLY BRACKET MIDDLE PIECE or ❴❵ 2774 MEDIUM LEFT CURLY BRACKET ORNAMENT 2775 MEDIUM RIGHT CURLY BRACKET ORNAMENT or ⦃⦄ 2983 LEFT WHITE CURLY BRACKET 2984 RIGHT WHITE CURLY BRACKET or ﹛﹜ FE5B SMALL LEFT CURLY BRACKET FE5C SMALL RIGHT CURLY BRACKET or {} FF5B FULLWIDTH LEFT CURLY BRACKET FF5D FULLWIDTH RIGHT CURLY BRACKET Unfortunately, in my current font some of these display rather poorly. In Richard Ishida's rendering, I quite like U+2983 and U+2984, but they are a bit small in the font I'm looking at right now. Some of the square bracket and half-bracket pairs (in Uniview, search text for 'half bracket') would perhaps fare better across fonts. Of course, for the group to accept this idea, there would have to be general acceptance of the view that the choice of delimiters is to be made on aesthetic and psychological grounds (what will a given pair suggest to the human reader? how will it feel to use these delimiters or those?) because the effect on technical complexity is nil. I don't know if people are willing to accept that conclusion or not. Michael -- C. M. Sperberg-McQueen Black Mesa Technologies LLC http://blackmesatech.com
Received on Thursday, 27 January 2022 15:24:02 UTC