Re: Reserved characters in input? from Bethan Tovey-Walsh on 2025-02-13 (public-ixml@w3.org from February 2025)

From: Bethan Tovey-Walsh <bytheway@linguacelta.com>
Date: Thu, 13 Feb 2025 17:53:43 +0000
To: David Birnbaum <djbpitt@gmail.com>
Cc: ixml <public-ixml@w3.org>
Message-Id: <B0AD7C61-730C-40C6-869A-2801611B7DF1@linguacelta.com>

Might it be worth asking whether Gunther would be willing to add the option to produce escaped characters instead of CDATA?

I don't think there's any iXML-internal way of fixing this, because even if you use insertions to replace reserved characters with their entity references, presumably MarkupBlitz would still interpret the inserted string "&amp;" as a string starting with a reserved character, and stick it in a CDATA. The only thing you could do would be fudge it, insert some placeholder instead of the reserved character, and post-process to replace the placeholder with an entity reference. In which case, you might just as well save yourself the trouble, and post-process the original CDATA with XSLT vel sim.

BTW

***

Dr. Bethan Tovey-Walsh 

linguacelta.com <http://linguacelta.com/> 

Golygydd | Editor http://geirfan.cymru <http://geirfan.cymru/> 

Croeso i chi ysgrifennu ataf yn y Gymraeg.

> On 13 Feb 2025, at 16:38, David Birnbaum <djbpitt@gmail.com> wrote:
> 
> Thanks, Fredrik, and John, for the quick responses. Getting rid of the CDATA marked section (in favor of &amp;) downstream isn't a problem, but I was wondering whether it was possible within ixml, and I understand why ixml might reasonably consider that type of control out of scope. Perhaps a candidate for a pragma, should an ixml processor opt to put that decision under user control?
> 
> On Thu, Feb 13, 2025 at 11:33 AM John Lumley <john@saxonica.com> wrote:
> My processor (https://johnlumley.github.io/jwiXML.xhtml) uses fn:serialize() in SaxonJS as the serializer of the XML parse result, so
> S: ~[].
> with & as input, produces
> <S>&amp;</S>
> 
> John Lumley
> Sent from my iPad
> 
>> On 13 Feb 2025, at 15:57, David Birnbaum <djbpitt@gmail.com> wrote:
>> 
>> Dear public-ixml,
>> 
>> Is there an ixml idiom for ingesting reserved characters (ampersand, angle brackets) and replacing them with XML entities? When I parse a plain-text input document that contains an ampersand using Markup Blitz or xmq, the output element creates a CDATA marked section for the entire content, so that, for example, when:
>> 
>> "Wynken, Blynken & Nod"
>> 
>> matches the production for a <title> element, it emerges as
>> 
>> <title><![CDATA["Wynken, Blynken & Nod"]]></title>
>> 
>> What I'd prefer is:
>> 
>> <title>"Wynken, Blynken &amp; Nod"</title>
>> 
>> Thanks in advance for any advice!
>> 
>> Sincerely,
>> 
>> David
>>

Received on Thursday, 13 February 2025 17:54:06 UTC