Re: Reserved characters in input?

On Friday 14 February 2025 10:34:08 (+01:00), John Lumley wrote:


There is I think a double confusion - we use the term ‘serialization’ in the iXML spec to cover the translation of the parse tree into an XML (tree) structure. But these issues of representing ‘entities’ are a consequence of the next stage of normal processing - ‘serializing’ an XML tree into a textual representation, which is where the iXML spec has not ventured. As far as I can see, this is an implementation/output format issue, not an iXML one


We could, for interoperability's sake, require that serialization produce the entities and not use CDATA sections.


Steven




John Lumley

Sent from my iPad


On 13 Feb 2025, at 16:39, David Birnbaum <djbpitt@gmail.com> wrote:



Thanks, Fredrik, and John, for the quick responses. Getting rid of the CDATA marked section (in favor of &amp;) downstream isn't a problem, but I was wondering whether it was possible within ixml, and I understand why ixml might reasonably consider that type of control out of scope. Perhaps a candidate for a pragma, should an ixml processor opt to put that decision under user control?


On Thu, Feb 13, 2025 at 11:33 AM John Lumley <john@saxonica.com <mailto:john@saxonica.com> > wrote:

My processor (https://johnlumley.github.io/jwiXML.xhtml) uses fn:serialize() in SaxonJS as the serializer of the XML parse result, so
S: ~[].
with & as input, produces
<S>&amp;</S>


John Lumley

Sent from my iPad


On 13 Feb 2025, at 15:57, David Birnbaum <djbpitt@gmail.com <mailto:djbpitt@gmail.com> > wrote:



Dear public-ixml,


Is there an ixml idiom for ingesting reserved characters (ampersand, angle brackets) and replacing them with XML entities? When I parse a plain-text input document that contains an ampersand using Markup Blitz or xmq, the output element creates a CDATA marked section for the entire content, so that, for example, when:


"Wynken, Blynken & Nod"


matches the production for a <title> element, it emerges as


<title><![CDATA["Wynken, Blynken & Nod"]]></title>


What I'd prefer is:


<title>"Wynken, Blynken &amp; Nod"</title>


Thanks in advance for any advice!


Sincerely,


David

Received on Monday, 17 February 2025 14:58:35 UTC