Re: an odd ambiguity

Hi Neil,

I'm just writing to agree and say "it's even worse!"
I fished out ten arXiv examples:
https://hackmd.io/@dginev/S1pGCeOJt

My simple search for using at least two vertical bars in a braced
block yielded matches in ~7000 papers, or 0.5% of my arXiv collection.
That's a lower bound, but may be the right magnitude.

First, the braces are not always set constructors. In the quick list I
grabbed, they were also:
- sequence
- expressions fences (e.g. delimit arguments of integrals),
- custom symbol introduced by the paper, such as { N | A | d }.
- multibrace notation in quantum algebra

Meanwhile the vertical bar can be a fence in other complex notations
(e.g. Legendre symbol, bra-kets), or can have a range of independent
meanings such as "divides", "partition", "norm", "absolute value". And
even more sophisticated structural uses e.g. "iterated conditional".
And as David pointed out, authors tend to mix these bravely, as
needed.

\mid is consistently used by some authors, but not by others.
Especially in papers which are not *about* set theoretical concepts,
but happen to use a set or two while setting up some intermediate
step, say in quantum physics. You'll find some such uses in the
examples I drew from arXiv. Some would say those authors made a
typographic error, others would call them expedient. I can certainly
read a set constructor with the vertical bar (or colon!) "such that"
both with and without the extra spacing.

With all that laid out, I think aiming for the operator dictionary to
treat Unicode correctly is a good goal, but aiming for resolving
ambiguities may be a bit too ambitious for such a component.

Greetings,
Deyan

On Wed, Aug 4, 2021 at 12:28 AM Neil Soiffer <soiffer@alum.mit.edu> wrote:
>
> We've mentioned how ambiguous "|" can be, but I don't remember seeing anyone mentioning this example:
> { x   ∣ x  ∣ 10}
> The set of all x such that x divides 10.
>
> In one expression are both the low priority separator "such that" and the medium priority relational operator "divides" (both are infix). There are two characters that could be used: vertical bar (U+007C) and divides (U+2223).  The Unicode Standard indicates that both should be U+2223 (I'm not sure that equivalence is correct)
>
> In TeX, there seems to be agreement that the first bar is be \mid. However, there seems to be disagreement for what to use for the second bar. Some people suggest \mid, others "|", and still others \divides (which only exists in the MnSymbol package AFAIK). There are spacing differences and maybe height differences. Using different macros means there is a potential semantic distinction if the author actually uses them as opposed to using the ASCII "|". A reason TeX distinguishes them is that the spacing around the vertical bar differs a little. Someone will surely correct me on this if I'm wrong, but the spacing of these two uses is opposite their contextual meaning. TeX considers \mid to be a relational operator, but relational operators return boolean values -- \mid is really a separator/punctuation. On the other hand, \divides really is a relation (m divides n is either true or false), but it is spaced as a binary operator (at least in this context). Typographically, this is what is supposed to happen, but it seems counter-intuitive. Very strange.
>
> What does this mean for MathML? One thing is that in practice, software can't be sure the correct symbol is used in MathML (I leave it to someone else to report what TeX, ASCIIMath, and WYSIWYG editors use). The other issue is what the operator dictionary should say about the spacing and priority for these two symbols. Currently they both have the same spacing and priority, but that seems wrong.
>
> Thoughts?
>
>     Neil

Received on Wednesday, 4 August 2021 12:21:31 UTC