- From: Norm Tovey-Walsh <norm@saxonica.com>
- Date: Mon, 12 Sep 2022 09:55:07 +0100
- To: graydonish@gmail.com
- Cc: public-ixml@w3.org
- Message-ID: <m2y1upq85q.fsf@saxonica.com>
> I think -- very possibly wrongly! -- that ^S, ^M, etc. are a notational > convention to represent characters with no associated glyph. Whether or not U+0013 has a glyph or not depends on the font you’re using and what your display engine things it should do with them. > This reminds of me of when there were issues with the Microsoft "high > ascii" characters, where (for example) cp1252 0097 was the em-dash > character but the code point wasn't legal in XML until the fifth > edition. That’s a slightly different case. That’s about character encodings (and tangentially about some…interesting…choices made by the folks at Microsoft about what to use for the default encoding). I could invent an encoding NDW-1 that put “A” at position 0, “B” at position 1, etc. If I then wrote this XML file (Assuming the same conventions for ^S): <?xml version="1.0" encoding="NDW-1"> <doc>^S</doc> And if my XML parser understood the encoding NDW-1, an identity transformation on that document could produce the following completely equivalent document: <?xml version="1.0" encoding="UTF-8"> <doc>S</doc> The discussion here is about U+0013 in an UTF-8 (or US ASCII similarly encoded) document. Which I admit, I did not make clear. Be seeing you, norm -- Norm Tovey-Walsh Saxonica
Received on Monday, 12 September 2022 09:01:09 UTC