Re: using ixml with mixed content - a design problem

For both the problems, I would add a set of rules to capture any text in
angular brackets as a type of node (hypothetical, untested):

xmlelem: -paired; -selfclosed .
-paired: "<" , token , -attrs* , ">" , content* , "</" , token , ">" .
-content: -text; -xmlelem .
-selfclosed: "<", token , -attrs*, "/>" .

Allow xlmelem to be any suitable places, and do a tree walk to restore
the XML inside xmlelem node.

ldbeth


>>>>> In <87zg4wix1w.fsf@blackmesatech.com> 
>>>>>	"C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com> wrote:
CMS> For example, I once wrote code to recognize names of people in a
CMS> database of information about Roman legal disputes (Trials in the Late
CMS> Roman Republic 149-50 BC).  It's easy enough to write a grammar to
CMS> recognize strings like

CMS>   Q. Lutatius Catulus (7) cos. 102
CMS>   Sex. Lucilius (15) tr. pl. 87

CMS> and parse them into praenomen, nomen, cognomen, Realenzyklopädie-number
CMS> (the Quintus Lutatius Catulus mentioned here is the one described by the
CMS> seventh article under that name in Pauly and Wissowa's
CMS> Realenzyklopädie), and highest office attained plus date of that office.
CMS> And it's possible to recognize a series of such names and parse each of
CMS> them.

CMS> But since our information about Roman history is sometimes complicated
CMS> and requires clarification, sometimes what I had to parse was a sequence
CMS> of names with footnotes interspersed, like a 'defendant' element reading:

CMS>   <defendant>
CMS>     (L. or Q.?) Hortensius (2) cos. des.?<note><p>Since a magistrate in
CMS>     office could not be prosecuted, it seems likely that he was convicted
CMS>     before taking office.  See Atkinson (1960) 462 n. 108; Swan (1966)
CMS>     239-40; and Weinrib (1971) 145 n. 1.</p></note> 108
CMS>   </defendant>

... ...

CMS> and a pretty-printer for WEB could parse the embedded @<...@> sequences
CMS> as cross-references, in my XML-based LP system this code scrap would
CMS> look something like this:

CMS>   <scrap file="primes.pas"
CMS>          n="Program to print the first thousand prime numbers">
CMS>   program print_primes(output);
CMS>     const m=1000;
CMS>           <ptr target="constants"/>;
CMS>     var <ptr target="vars"/>;
CMS>   begin
CMS>       <ptr target="print-m-primes"/>
CMS>   end.
CMS>   </scrap>

CMS> Does anyone have ideas about the best way of using ixml to allow the
CMS> enrichment of material that is already partially marked up?

CMS> Michael

Received on Monday, 19 June 2023 04:37:57 UTC