- From: Norm Tovey-Walsh <norm@saxonica.com>
- Date: Mon, 12 Sep 2022 09:55:07 +0100
- To: graydonish@gmail.com
- Cc: public-ixml@w3.org
- Message-ID: <m2y1upq85q.fsf@saxonica.com>
> I think -- very possibly wrongly! -- that ^S, ^M, etc. are a notational
> convention to represent characters with no associated glyph.
Whether or not U+0013 has a glyph or not depends on the font you’re
using and what your display engine things it should do with them.
> This reminds of me of when there were issues with the Microsoft "high
> ascii" characters, where (for example) cp1252 0097 was the em-dash
> character but the code point wasn't legal in XML until the fifth
> edition.
That’s a slightly different case. That’s about character encodings (and
tangentially about some…interesting…choices made by the folks at
Microsoft about what to use for the default encoding).
I could invent an encoding NDW-1 that put “A” at position 0, “B” at
position 1, etc. If I then wrote this XML file (Assuming the same
conventions for ^S):
<?xml version="1.0" encoding="NDW-1">
<doc>^S</doc>
And if my XML parser understood the encoding NDW-1, an identity
transformation on that document could produce the following completely
equivalent document:
<?xml version="1.0" encoding="UTF-8">
<doc>S</doc>
The discussion here is about U+0013 in an UTF-8 (or US ASCII similarly
encoded) document. Which I admit, I did not make clear.
Be seeing you,
norm
--
Norm Tovey-Walsh
Saxonica
Received on Monday, 12 September 2022 09:01:09 UTC