Re: Words than are not this word

> With the two deletes lumped together.
>
> My reaction is to look for a way to make the matching non-greedy (I
> haven't found one) or to define "word" as "anything but this specific
> string". (Fairly sure that's impossible in ixml.)

I think it should be possible to rework this so that a priority (a
CoffeeFilter extension to iXML) can be assigned to the shorter match,
but the priority stuff is still kind of experimental and I wasn’t able
to make that work in my first couple of tries.

> Is there a way to disambiguate this and guarantee that each delete or
> insert will start a block?

In principle, you could create a rule that matches sequences of
characters that are neither ‘d’, ‘e’, ‘l’, ‘e’, ‘t’, ‘e’ or ‘i’, ‘n’,
‘s’, ‘e’, ‘r’, ‘t’ but in practice I think that’d be much too (too!)
large a combinatorial explosion.

Another way is to preprocess the input so that the keywords (“delete”
and “insert”) can be made unambiguously different from ordinary words.

I picked the Line Separator character (U+2028) which is neither a space
nor part of a word in your grammar and changed the rules (in what
follows, I’ve used “?” instead of the actual line separator because the
actual line separator is probably going to get mangled by email
transmission):

whole = (delBlock|insBlock)+,last,NL.            

delBlock = -'?', 'delete',space,(word,space)+.                     
insBlock = -'?', 'insert',space,(word,space)+.           
last = word. 

-space = [Zs]+.
word = [L;P;Nd;Sc]+.

-NL = -#A.

and the input:

?delete the rest of the line and ?delete line 6 and ?insert “this; and that; and the other thing”.

That produces this, unambigously:

<whole>
   <delBlock>delete
      <word>the</word>
      <word>rest</word>
      <word>of</word>
      <word>the</word>
      <word>line</word>
      <word>and</word> </delBlock>
   <delBlock>delete
      <word>line</word>
      <word>6</word>
      <word>and</word> </delBlock>
   <insBlock>insert
      <word>“this;</word>
      <word>and</word>
      <word>that;</word>
      <word>and</word>
      <word>the</word>
      <word>other</word> </insBlock>
   <last>
      <word>thing”.</word>
   </last>
</whole>

Hope that helps.

                                        Be seeing you,
                                          norm

--
Norm Tovey-Walsh
Saxonica

Received on Thursday, 8 September 2022 07:48:12 UTC