- From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
- Date: Thu, 27 Jan 2022 11:57:22 -0700
- To: Bethan Tovey-Walsh <accounts@bethan.wales>
- Cc: public-ixml@w3.org
Bethan Tovey-Walsh writes: > What if we think of both pragma delimiters and comment delimiters as > notifications to the processor? The content of the pragma is addressed > to a processor, whereas the content of the comment is addressed to a > human. At a first approximation, I think that's likely to be true, but it has been pointed out already that exceptions are possible on both sides of the line. If a pragma can be interpreted as a machine-readable annotation (like many in the discussion document), then the information it provides can be of interest and use to a human reading the grammar as well as to software. And on the other side, even if the language has pragmas there is no way to prevent my deciding that {!} is a simpler way of annotating rules in grammars to mean some particular thing (like 'rewrite this rule' as in the rule-rewriting example in the discussion document) or elicit some particular behavior in my processor (even something as banal as "boldface the left-hand side of this rule in the display"). But yes, I think most of us, and maybe everyone, will agree that there is some distinction between pragmas and comments which are not pragmas, though finding words for it that we all agree on may prove tricky. > But the *delimiters* in both cases are addressed to the > processor, instructing it either a) to ignore a comment, or b) to > behave as appropriate for a pragma (which may mean ignoring it, or > doing something with it). Okay, I guess, though I very much want to label the notion of particular characters have addressees as metaphorical not literal. Any delimiter in a formal grammar can be (is) used by the processor to know where there are notional boundaries in the input and what the things on either side of the boundaries are. In that sense, all delimiters are addressed to the processor. And, because they can also be and are used by human readers for the same purpose, they are also addressed to the human reader. > If we can accept this, ... Warning ... see below. > ... it makes sense to have a basic delimiter > meaning “hello processor, this is not a bit of code for you to process > as normal”, and one of the following: > a) an extra delimiter saying “this one’s a pragma, don’t ignore it (yet)”; > b) an extra delimiter saying “this one’s a comment, ignore it”; > c) both of the above. I am (a) perfectly happy with this paragraph, (a.i) believe it, (a.ii) disbelieve it, and am (b) moderately concerned about it. It's (a) phrased as a conditional, and I am (a.i) happy to accept that conditional: if one accepts the premise, then it does indeed make sense to interpret multi-character delimiters as described. At least, some of the time. At other times, I think (a.ii) "hang on a bit, that consequent really doesn't follow from the antecedent". The more important concern is the fear that (b) it's not quite meant as a material implication of the kind seen in formal logic, in which p -> q does not say anything at all about any intrinsic relation between p and q, only about the possibility that p is true while q is false, so that if q is true, p can be absolutely any proposition at all. Concretely, if the line of reasoning you outline leads people to reach consensus on a choice of delimiters, it's all to the good. But I don't think it is necessary that we all agree on the premise, or on the connection between premise and conclusion. (And since I don't expect everyone to agree on the premise, this fact looms large for me.) Our task as a group is to reach agreement, if we can, on a choice of delimiters. Agreement on why it's a good choice is optional and seeking it may be counter-productive. > I’d vote for b). ... > I know this seems like a 180 from my previous position, but I don’t > believe it is. I still think that pragmas and comments are different > things; but I no longer think that the *delimiters* are part of what > makes them different. In fact, I’m not sure that the delimiters form > part of a pragma/comment at all. It depends, of course, on how things are defined. In the current ixml specification grammar, the delimiters are part of the comment, just as the delimiters for quoted strings and character sets are part of the strings recognized by the corresponding nonterminals. In XML, the delimiters are, in the grammar, part of the start-tag, sole-tag, end-tag, general entity reference, parameter entity reference, or numeric character reference. But the role played by the delimiters in relation to the comment, the tag, the element, the entity reference, the entity, etc., is clearly quite different from that played by the characters that fall within the delimiters. For one thing, the delimiters used for one comment are the same as the delimiters used for another comment and thus convey no information other than 'this is a comment'. And when we have other ways to carry that information (as in the vxml form of a grammar), we don't bother to retain the delimiters. So I think I agree with the point I think you are wanting to make, although I think I disagree with your reasoning. I also agree that it's not the delimiters that make things different. In (what I think is) the usual case, we choose different delimiters for different things not in order to make the things different but to reflect the fact that they are already different. Michael >> On 27 Jan 2022, at 15:59, Bethan Tovey-Walsh <accounts@bethan.wales> wrote: >> >> I rather like the idea of using ⦃⦄ (U+2983 and U+2984 - white curly brackets), and offering an ASCII two-character alternative; maybe {[ ]} (or {| |}). So ⦃a:b stuff⦄ and {[a:b stuff]} would be equivalent. It’s not strictly the same as the comment delimiter, but it is still a type of curly bracket. And the two-character form looks very similar to the single-character delimiter, so it’ll be easy for a human reader to recognize them as equivalent. >> >> >> ___________________________________________________ >> Dr. Bethan Tovey-Walsh >> Myfyrwraig PhD | PhD Student CorCenCC >> Prifysgol Abertawe | Swansea University >> Croeso i chi ysgrifennu ataf yn y Gymraeg. >> >>> On 27 Jan 2022, at 15:23, C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com <mailto:cmsmcq@blackmesatech.com>> wrote: >>> >>> >>> John Lumley writes: >>> >>>> At risk of being shot down in flames, there is an ASCII 'bracket' pair >>>> that we aren't currently using, neither of which appears, as far as I >>>> can see, in the IXML grammar, >>>> >>>> viz: '<' and '>'. >>>> >>>> Now I know there are other (alright perhaps many) reasons to suggest >>>> avoiding them, but they won't currently appear outside strings in any >>>> valid IXML and are seen as 'container pairs', and are certainly ASCII. >>> >>>> Just for sake of some completeness.... >>> >>> You're a brave man, John. >>> >>> It has been more than 20 years since Java and XML both made Unicode the >>> central character set. I suspect that by now even { and } are >>> transferred correctly nowadays between IBM mainframes and the rest of >>> the world, although I don't have a convenient way to check. I think >>> it's time we left seven-bit character sets to the lower-level networking >>> protocols and used Unicode without apology. >>> >>> I won't object on principle to ASCII delimiters, but I decline to view >>> being in ASCII as an advantage for any delimiter proposal. >>> >>> In any case, convenience of typing and being in ASCII are not really the >>> same. They may be roughly the same on U.S. and for the most part on >>> U.K. keyboards, but my recollection is that getting some ASCII >>> characters -- in particular < and > -- was much more complicated on >>> Norwegian keyboards than I had ever imagined. (Well, not *that* >>> complicated, but I believe it involved both the Alt-Gr key and the shift >>> key as well as a third key.) In Norway, discussions about raw XML or >>> HTML being easy to type always rang a little hollow. >>> >>> Any Unicode viewer with a search capacity will show a wide range of >>> possibilities. Using Richard Ishida's Uniview [1] and searching 'text' >>> for 'bracket' is enlightening. >>> >>> [1] https://r12a.github.io/uniview/ <https://r12a.github.io/uniview/> >>> >>> I wonder if we could achieve both (a) a visual echo of the { ... } >>> delimiters we use for comments and (b) a single-character pair, by using >>> one of Unicode's several variants on curly braces: >>> >>> ⎨⎬ >>> >>> 23A8 LEFT CURLY BRACKET MIDDLE PIECE >>> 23AC RIGHT CURLY BRACKET MIDDLE PIECE >>> >>> or ❴❵ >>> >>> 2774 MEDIUM LEFT CURLY BRACKET ORNAMENT >>> 2775 MEDIUM RIGHT CURLY BRACKET ORNAMENT >>> >>> or ⦃⦄ >>> >>> 2983 LEFT WHITE CURLY BRACKET >>> 2984 RIGHT WHITE CURLY BRACKET >>> >>> or ﹛﹜ >>> >>> FE5B SMALL LEFT CURLY BRACKET >>> FE5C SMALL RIGHT CURLY BRACKET >>> >>> or {} >>> >>> FF5B FULLWIDTH LEFT CURLY BRACKET >>> FF5D FULLWIDTH RIGHT CURLY BRACKET >>> >>> Unfortunately, in my current font some of these display rather poorly. >>> In Richard Ishida's rendering, I quite like U+2983 and U+2984, but they >>> are a bit small in the font I'm looking at right now. Some of the >>> square bracket and half-bracket pairs (in Uniview, search text for 'half >>> bracket') would perhaps fare better across fonts. >>> >>> Of course, for the group to accept this idea, there would have to be >>> general acceptance of the view that the choice of delimiters is to be >>> made on aesthetic and psychological grounds (what will a given pair >>> suggest to the human reader? how will it feel to use these delimiters >>> or those?) because the effect on technical complexity is nil. I don't >>> know if people are willing to accept that conclusion or not. >>> >>> Michael >>> >>> >>> -- >>> C. M. Sperberg-McQueen >>> Black Mesa Technologies LLC >>> http://blackmesatech.com <http://blackmesatech.com/> >>> >> -- C. M. Sperberg-McQueen Black Mesa Technologies LLC http://blackmesatech.com
Received on Thursday, 27 January 2022 18:57:43 UTC