Re: update to xml entities draft

On 20150627 at 143534-0700 "Asmus Freytag (t)" writes:

> Unicode generally does not encode characters by usage. For
> example there's no distinction between period, decimal
> point, abbreviation point etc.. This reflects the underlying
> situation, to wit, that this is a case of the *same* symbol
> being used in different conventions.
>
> The downside is that it is thus not possible to use plain
> text to capture which convention is intended (but nothing
> prevents anyone from providing rich-text markup). The upside
> is that data can't exhibit "random alternation" between
> identical looking symbols; experience has shown that this is
> a most likely outcome if "the same" item is encoded several
> times, based merely on convention.

Period, decimal point, abbreviation point: three different
names and three different concepts commonly sharing the same
symbol though not necessarily the same left and right
spacing.

As a point of argument (but not a request) they *should* be
three different characters.

Absent that, the typesetter with a proportional font must
use various conventions, not completely reliable, to guess
the spacing.  Of course, commonly the user will be oblivious
of these differences and the user's keyboard will have only
one of these.  But the astute user may want to be able to
make distinctions.  The distinctions can be made available,
for example, in rich text, as you observe, in SGML, or in
LaTeX.

With a given oblivious user and a given typesetting suite
random alternation will not occur.

Other than for searching I fail to see why random
alternation should be a problem.  Are there other problems
associated with random alternation?

As to mathematical searching, searching for mathematical
symbols is an order of magnitude more complicated than
searching for text, e.g., multi-character math symbols,
things like phi vs varphi, ..., so the small number of
possible alternations (at most 256, the size of the U+21xx
block, actually quite a few less than that) should not add
much complexity to code for mathematical symbol searching.

                         -- Bill

Received on Thursday, 2 July 2015 18:13:34 UTC