- From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
- Date: Thu, 08 Sep 2022 17:33:49 -0600
- To: Norm Tovey-Walsh <norm@saxonica.com>
- Cc: Graydon Saunders <graydonish@gmail.com>, public-ixml@w3.org
Norm Tovey-Walsh <norm@saxonica.com> writes:
>> Is there a way to disambiguate this and guarantee that each delete or
>> insert will start a block?
>
> In principle, you could create a rule that matches sequences of
> characters that are neither ‘d’, ‘e’, ‘l’, ‘e’, ‘t’, ‘e’ or ‘i’, ‘n’,
> ‘s’, ‘e’, ‘r’, ‘t’ but in practice I think that’d be much too (too!)
> large a combinatorial explosion.
For two keywords, I think it's doable. What is required is that 'word'
be any non-empty string of acceptable characters that is not 'delete' or
'insert', right? I'd suggest something like this:
word = ~['di'; Zs], [L;P;Nd;Sc]*
; 'd', ~['e'; Zs], [L;P;Nd;Sc]*
; 'de', ~['l'; Zs], [L;P;Nd;Sc]*
; 'del', ~['e'; Zs], [L;P;Nd;Sc]*
; 'dele', ~['t'; Zs], [L;P;Nd;Sc]*
; 'delet', ~['e'; Zs], [L;P;Nd;Sc]*
; 'delete', [L;P;Nd;Sc]+
; 'i', ~['n'; Zs], [L;P;Nd;Sc]*
; 'in', ~['s'; Zs], [L;P;Nd;Sc]*
; 'ins', ~['e'; Zs], [L;P;Nd;Sc]*
; 'inse', ~['r'; Zs], [L;P;Nd;Sc]*
; 'inser', ~['t'; Zs], [L;P;Nd;Sc]*
; 'insert', [L;P;Nd;Sc]+
.
If there is a real likelihood that the exlusions will match characters
that should not be part of a word, then each 'word' element in the
output can be rescanned to make sure it's OK; otherwise, you may be able
to spare yourself the re-scanning.
On another note, I would make quoted strings a grammatical unit, to
avoid the risk of recognizing keywords within them.
Michael
--
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
http://blackmesatech.com
Received on Thursday, 8 September 2022 23:47:13 UTC