Re: using ixml with mixed content - a design problem from C. M. Sperberg-McQueen on 2023-06-19 (public-ixml@w3.org from June 2023)

From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
Date: Sun, 18 Jun 2023 22:00:38 -0600
To: Norm Tovey-Walsh <norm@saxonica.com>
Cc: public-ixml@w3.org
Message-ID: <87mt0whz1f.fsf@blackmesatech.com>

Norm Tovey-Walsh <norm@saxonica.com> writes:

> "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com> writes:
>> and a pretty-printer for WEB could parse the embedded @<...@> sequences
>> as cross-references, in my XML-based LP system this code scrap would
>> look something like this:
>>
>>   <scrap file="primes.pas"
>>          n="Program to print the first thousand prime numbers">
>>   program print_primes(output);
>>     const m=1000;
>>           <ptr target="constants"/>;
>>     var <ptr target="vars"/>;
>>   begin
>>       <ptr target="print-m-primes"/>
>>   end.
>>   </scrap>
>> ...
>
> Could you get any milage out of parsing this text?
>
>    program print_primes(output);
>      const m=1000;
>            &lt;ptr target="constants"/&gt;;
>      var &lt;ptr target="vars"/&gt;;
>    begin
>        &lt;ptr target="print-m-primes"/&gt;
>    end.
>
> If so, might you parse the serialization of the content of scrap?

Yes, and I think so.  That is one of the ideas that occurred to me; the
idea suggested by Liam Quin's reply is another.  Both of them look
better to me now, knowing that other people thought them plausible.

It seems a bit indirect to serialize something so that I have to
re-parse something that has already been parsed once.  But it's easy to
understand.

Part of me thinks the right thing to do (or just the most interesting?)
would be to re-think my ixml parser so that instead of a string it
accepts a sequence of items and progresses through them.  I see two
complications: first, keeping track of the current location seems likely
to be complicated when it could be an offset in a string or a position
in a text node deeply nested in an element node.  And second, I would
need a way to write terms which match elements, comments, and processing
instructions in the input.

In the short run, serializing and reparsing is going to simpler and
quicker to implement.

For the literate programming case, the XML that can occur within a code
scrap is restricted enough that serializing the XML and re-parsing it
would not be too hard.  For the Roman history case, and other cases in
general, where the XML might be arbitrarily complex, I really like
Liam's idea of an easy-to-parse placeholder, which can easily be
replaced with the original element (or other item).

Thank you!
-- 
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
http://blackmesatech.com

Received on Monday, 19 June 2023 04:11:23 UTC