- From: LdBeth <andpuke@foxmail.com>
- Date: Sun, 18 Jun 2023 23:37:29 -0500
- To: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>
- Cc: ixml <public-ixml@w3.org>
For both the problems, I would add a set of rules to capture any text in angular brackets as a type of node (hypothetical, untested): xmlelem: -paired; -selfclosed . -paired: "<" , token , -attrs* , ">" , content* , "</" , token , ">" . -content: -text; -xmlelem . -selfclosed: "<", token , -attrs*, "/>" . Allow xlmelem to be any suitable places, and do a tree walk to restore the XML inside xmlelem node. ldbeth >>>>> In <87zg4wix1w.fsf@blackmesatech.com> >>>>> "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com> wrote: CMS> For example, I once wrote code to recognize names of people in a CMS> database of information about Roman legal disputes (Trials in the Late CMS> Roman Republic 149-50 BC). It's easy enough to write a grammar to CMS> recognize strings like CMS> Q. Lutatius Catulus (7) cos. 102 CMS> Sex. Lucilius (15) tr. pl. 87 CMS> and parse them into praenomen, nomen, cognomen, Realenzyklopädie-number CMS> (the Quintus Lutatius Catulus mentioned here is the one described by the CMS> seventh article under that name in Pauly and Wissowa's CMS> Realenzyklopädie), and highest office attained plus date of that office. CMS> And it's possible to recognize a series of such names and parse each of CMS> them. CMS> But since our information about Roman history is sometimes complicated CMS> and requires clarification, sometimes what I had to parse was a sequence CMS> of names with footnotes interspersed, like a 'defendant' element reading: CMS> <defendant> CMS> (L. or Q.?) Hortensius (2) cos. des.?<note><p>Since a magistrate in CMS> office could not be prosecuted, it seems likely that he was convicted CMS> before taking office. See Atkinson (1960) 462 n. 108; Swan (1966) CMS> 239-40; and Weinrib (1971) 145 n. 1.</p></note> 108 CMS> </defendant> ... ... CMS> and a pretty-printer for WEB could parse the embedded @<...@> sequences CMS> as cross-references, in my XML-based LP system this code scrap would CMS> look something like this: CMS> <scrap file="primes.pas" CMS> n="Program to print the first thousand prime numbers"> CMS> program print_primes(output); CMS> const m=1000; CMS> <ptr target="constants"/>; CMS> var <ptr target="vars"/>; CMS> begin CMS> <ptr target="print-m-primes"/> CMS> end. CMS> </scrap> CMS> Does anyone have ideas about the best way of using ixml to allow the CMS> enrichment of material that is already partially marked up? CMS> Michael
Received on Monday, 19 June 2023 04:37:57 UTC