RE: Inaccessibility of [#80-#FF]


comment-line: "%", -char+, eol.
char: ~[#a; #d].
eol: [#a; #d].

This works only because the Unicode replacement character is now included in the iXML character class. As a result, the actual byte content of the original comment-line is lost; the XML output does not preserve the original characters, only their substituted form.

<?xml version="1.0" encoding="utf-8"?><start><comment-line>%PDF-1.7<eol>
</eol></comment-line><comment-line>%����<eol>
</eol></comment-line></start>
(Markup Blitz 1.8)

Bytes in the range of 128-255 can appear in normal comments too, and would need to be mapped to XML, somehow.

E.g.:

% Author:  Leandra Yésica

In such cases, I would expect those bytes to be surfaced as valid Unicode code points in the corresponding XML representation (e.g., &#233; for U+00E9, the character 'é'), rather than being silently replaced.

John

Received on Tuesday, 24 June 2025 10:48:36 UTC