To the owner of the industry_codes grammar

To the owner of the industry_codes grammar:

 label: (word+, " "?)+.

should be 

 label: word++" ".

otherwise it is very ambiguous, since a word is any string of letters.


So aaa can be parsed as

 <word>a</word><word>a</word><word>a</word>
 <word>aa</word><word>a</word>
 <word>a</word><word>aa</word>
 <word>aaa</word>

but then the optional space means it can also be parsed as any of



 <word>a</word>?<word>a</word><word>a</word>
 <word>a</word><word>a</word>?<word>a</word>
 <word>a</word><word>a</word><word>a</word>?
 <word>a</word>?<word>a</word>?<word>a</word>

 <word>a</word>?<word>a</word><word>a</word>?
 <word>a</word><word>a</word>?<word>a</word>?
 <word>a</word>?<word>a</word>?<word>a</word>?

 <word>aa</word>?<word>a</word>
 <word>aa</word><word>a</word>?

 <word>aa</word>?<word>a</word>? <word>a</word>?<word>aa</word>
 <word>a</word><word>aa</word>?

 <word>a</word>?<word>aa</word>? <word>aaa</word>?


where ? represents the absent optional space.

And it only gets worse as the words get longer.

Furthermore, that space at the end also consumes the first space of your 
"separator".

One other point:

 industry_codes: code+.
 code: blablabla, newline*.

is better expressed as

 industry_codes: code++newline, newline?.
 code: blablabla.

since I believe that is the intention, and this allows the last newline to 
be optional.


Hope this helps.

Steven

Received on Friday, 21 February 2025 13:45:06 UTC