Re: Reserved characters in input? from Fredrik Öhrström on 2025-02-13 (public-ixml@w3.org from February 2025)

From: Fredrik Öhrström <oehrstroem@gmail.com>
Date: Thu, 13 Feb 2025 17:26:00 +0100
To: David Birnbaum <djbpitt@gmail.com>
Cc: ixml <public-ixml@w3.org>
Message-ID: <CALZT+jAk3GSWtgmNL0uN_hu1rESHVuBkvroQ2zp_ydqPei_c+w@mail.gmail.com>

I think this is tool dependent since both forms are valid XML. xmq does not
generate cdata nodes when generating new xml using an ixml grammar. But you
can also re-format the XML to drop the CDATA using xmq or xmlstarlet, like
this:

ixmltool | xmq to-xml --omit-decl

test:
echo '<title><![CDATA["Wynken, Blynken & Nod"]]></title>' | xmq to-xml
--omit-decl
outputs:
<title>"Wynken, Blynken &amp; Nod"</title>

(xmq always discards CDATA nodes when reading xml.)

or you can explicitly remove them using xmlstarlet:

ixmltool | xmlstarlet fo --omit-decl --nocdata

echo '<title><![CDATA["Wynken, Blynken & Nod"]]></title>' | xmlstarlet fo
--omit-decl --nocdata
outputs again:
<title>"Wynken, Blynken &amp; Nod"</title>

//Fredrik


Den tors 13 feb. 2025 kl 16:57 skrev David Birnbaum <djbpitt@gmail.com>:

> Dear public-ixml,
>
> Is there an ixml idiom for ingesting reserved characters (ampersand, angle
> brackets) and replacing them with XML entities? When I parse a plain-text
> input document that contains an ampersand using Markup Blitz or xmq, the
> output element creates a CDATA marked section for the entire content, so
> that, for example, when:
>
> "Wynken, Blynken & Nod"
>
> matches the production for a <title> element, it emerges as
>
> <title><![CDATA["Wynken, Blynken & Nod"]]></title>
>
> What I'd prefer is:
>
> <title>"Wynken, Blynken &amp; Nod"</title>
>
> Thanks in advance for any advice!
>
> Sincerely,
>
> David
>
>

Received on Thursday, 13 February 2025 16:26:31 UTC