- From: Steven Pemberton <steven.pemberton@cwi.nl>
- Date: Thu, 26 May 2022 12:27:51 +0000
- To: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>, "Norm Tovey-Walsh" <norm@saxonica.com>
- Cc: public-ixml@w3.org
On Tuesday 24 May 2022 16:36:46 (+02:00), C. M. Sperberg-McQueen wrote: > > Norm Tovey-Walsh writes: > > >> By the way, note that the following is now legal ixml: > >> > >> values: value+++",". > > > What does that mean? > > For what it's worth, I take it to mean > > <repeat1> > <nonterminal name="value"/> > <sep> > <insertion string=","/> > </ > </ This is exactly right. An insertion matches zero characters on input, but produces characters on output. For instance: data: ~[]+++" ". would insert a space between each character. abc => <data>a b c</data> values: [Nd]+++", ". Would insert a comma and a space between each digit. 123 => <values>1, 2, 3</values> Steven > If it is, then there is no separator, and the grammar of which this > fragment is part will work best for values with fixed length or values > which somehow can be parsed without delimiters. If every value must > begin with a letter and end with a digit, then a1bc23def456 can be > uniquely parsed without delimiters, right? > > For things like integers or decimal numbers, this grammar would make > sense perhaps in a stress test checking how well the processor deals > with finite, but fast-growing, ambiguity. Given > > value = ['0'-'9']+. > > and the input '12345', I'll get one parse with five values, one parse > with one value, four each with two or four values, and seven with three > values. So seventeen overall. > > (Hmm. This is not Pascal's Triangle. But I'm sure there is a > formula for how many ways there are to partition a sequence of length n > into k contiguous subsequences. I just can't remember.) > > > Michael > --
Received on Thursday, 26 May 2022 12:28:21 UTC