- From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
- Date: Thu, 08 Sep 2022 17:33:49 -0600
- To: Norm Tovey-Walsh <norm@saxonica.com>
- Cc: Graydon Saunders <graydonish@gmail.com>, public-ixml@w3.org
Norm Tovey-Walsh <norm@saxonica.com> writes: >> Is there a way to disambiguate this and guarantee that each delete or >> insert will start a block? > > In principle, you could create a rule that matches sequences of > characters that are neither ‘d’, ‘e’, ‘l’, ‘e’, ‘t’, ‘e’ or ‘i’, ‘n’, > ‘s’, ‘e’, ‘r’, ‘t’ but in practice I think that’d be much too (too!) > large a combinatorial explosion. For two keywords, I think it's doable. What is required is that 'word' be any non-empty string of acceptable characters that is not 'delete' or 'insert', right? I'd suggest something like this: word = ~['di'; Zs], [L;P;Nd;Sc]* ; 'd', ~['e'; Zs], [L;P;Nd;Sc]* ; 'de', ~['l'; Zs], [L;P;Nd;Sc]* ; 'del', ~['e'; Zs], [L;P;Nd;Sc]* ; 'dele', ~['t'; Zs], [L;P;Nd;Sc]* ; 'delet', ~['e'; Zs], [L;P;Nd;Sc]* ; 'delete', [L;P;Nd;Sc]+ ; 'i', ~['n'; Zs], [L;P;Nd;Sc]* ; 'in', ~['s'; Zs], [L;P;Nd;Sc]* ; 'ins', ~['e'; Zs], [L;P;Nd;Sc]* ; 'inse', ~['r'; Zs], [L;P;Nd;Sc]* ; 'inser', ~['t'; Zs], [L;P;Nd;Sc]* ; 'insert', [L;P;Nd;Sc]+ . If there is a real likelihood that the exlusions will match characters that should not be part of a word, then each 'word' element in the output can be rescanned to make sure it's OK; otherwise, you may be able to spare yourself the re-scanning. On another note, I would make quoted strings a grammatical unit, to avoid the risk of recognizing keywords within them. Michael -- C. M. Sperberg-McQueen Black Mesa Technologies LLC http://blackmesatech.com
Received on Thursday, 8 September 2022 23:47:13 UTC