- From: John Dziurlaj <john@turnout.rocks>
- Date: Tue, 24 Jun 2025 10:48:27 +0000
- To: Steven Pemberton <steven.pemberton@cwi.nl>, "public-ixml@w3.org" <public-ixml@w3.org>
Received on Tuesday, 24 June 2025 10:48:36 UTC
comment-line: "%", -char+, eol. char: ~[#a; #d]. eol: [#a; #d]. This works only because the Unicode replacement character is now included in the iXML character class. As a result, the actual byte content of the original comment-line is lost; the XML output does not preserve the original characters, only their substituted form. <?xml version="1.0" encoding="utf-8"?><start><comment-line>%PDF-1.7<eol> </eol></comment-line><comment-line>%����<eol> </eol></comment-line></start> (Markup Blitz 1.8) Bytes in the range of 128-255 can appear in normal comments too, and would need to be mapped to XML, somehow. E.g.: % Author: Leandra Yésica In such cases, I would expect those bytes to be surfaced as valid Unicode code points in the corresponding XML representation (e.g., é for U+00E9, the character 'é'), rather than being silently replaced. John
Received on Tuesday, 24 June 2025 10:48:36 UTC