Re: What about this grammar?

> I think -- very possibly wrongly! -- that ^S, ^M, etc. are a notational
> convention to represent characters with no associated glyph.

Whether or not U+0013 has a glyph or not depends on the font you’re
using and what your display engine things it should do with them.

> This reminds of me of when there were issues with the Microsoft "high
> ascii" characters, where (for example) cp1252 0097 was the em-dash
> character but the code point wasn't legal in XML until the fifth
> edition.

That’s a slightly different case. That’s about character encodings (and
tangentially about some…interesting…choices made by the folks at
Microsoft about what to use for the default encoding).

I could invent an encoding NDW-1 that put “A” at position 0, “B” at
position 1, etc. If I then wrote this XML file (Assuming the same
conventions for ^S):

   <?xml version="1.0" encoding="NDW-1">
   <doc>^S</doc>

And if my XML parser understood the encoding NDW-1, an identity
transformation on that document could produce the following completely
equivalent document:

   <?xml version="1.0" encoding="UTF-8">
   <doc>S</doc>

The discussion here is about U+0013 in an UTF-8 (or US ASCII similarly
encoded) document. Which I admit, I did not make clear.

                                        Be seeing you,
                                          norm

--
Norm Tovey-Walsh
Saxonica

Received on Monday, 12 September 2022 09:01:09 UTC