Implementer chat - trailing optional whitespace on root

Taking through a technical issue for an audience of fellow implementers. 
Others may find this thread exceedingly dull. :)

Hopefully writing this out should be enough for me to figure out the 
problem. (But if you're reading this, that means I hit 'Send' and thus 
could benefit from your perspective. Alan Kay is alleged to have said 
that a change of perspective is worth 80 IQ points...)

In this example, the top rule I'm using is:

ixml: s, rule++RS, s.

Where s is optional whitespace. And the rest of the grammar is a 
reasonably close facsimile of the actual grammar; any minor exceptions 
shouldn't matter for what follows.

The input is:

doc = "a", "b".

Including a newline at the end, that's 16 distinct Unicode characters 
presented as input.

Here is the last few lines of completed traces of the parse: (numbers 
after @ are the position after parsing that item. One-past-the-end, a la 
C++ iterators)

430) 0:15👉 rule=( ---f-option78@0, name@3, s@4, -'='@5, s@6, -alts@14, 
-'.'@15 •  )
433) 15:15👉 --f-option72=(  •  )
438) 15:16👉 whitespace=( -'
     '@16 •  )
439) 15:15👉 --f-star71=( ---f-option72@15 •  )
443) 16:16👉 --f-option77=(  •  )
441) 0:15👉 --f-plus-sep70=( rule@15, ---f-star71@15 •  )
446) 16:16👉 --f-star76=( ---f-option77@16 •  )
451) 15:15👉 --f-option74=(  •  )
448) 15:16👉 --f-plus75=( whitespace@16, ---f-star76@16 •  )
453) 15:15👉 --f-star73=( ---f-option74@15 •  )
454) 15:16👉 RS=( ---f-plus75@16 •  )
455) 15:15👉 s=( ---f-star73@15 •  )
459) 16:16👉 --f-option78=(  •  )
457) 0:15👉 ixml=( s@0, ---f-plus-sep70@15, s@15 •  )

 From the bottom up, the final trace 457) looks like the parse is in 
good shape, I think. From character positions 0:15 we have a complete 
match on the top ixml rule (which aligns with the the 'rule'  match at 
430), also at character positions 0:15). But to call it a success, the 
code that processes this trace is looking for a complete 0:16 match on 
the root rule 'ixml'.

There _is_ a 'whitespace' match 438) on a newline spanning 15:16, but 
since the ixml rule ends with optional whitespace, the rule gets marked 
as complete as seen here. Downstream code then errors out, since not all 
of the input was matched by the root rule.

And 454) is interesting. There's still a leftover task on 'ixml' trying 
to see if it can keep going.

The solution is at the tip of my cortex, but I'm just not seeing it.

Thoughts?

j

Received on Sunday, 18 September 2022 03:57:05 UTC