RelaxNG derivatives vs Earley parsing from M Joel Dubinko on 2022-07-14 (public-ixml@w3.org from July 2022)

From: M Joel Dubinko <micah@dubinko.info>
Date: Thu, 14 Jul 2022 00:12:25 -0400
To: ixml <public-ixml@w3.org>
Message-Id: <B3630669-9442-4FA1-AF64-CF8D4D88C912@dubinko.info>

Noodling around with the different informal descriptions of Earley parsers reminded me of something that impressed me about 20 years ago. Namely, [1]

There’s a lot in common between James Clark’s algorithm and Earley parsing. In particular, the “dot” notation (a la Wikipedia) 
 (X → α • β, i), consisting of

the production currently being matched (X → α β)
the current position in that production (represented by the dot)
the position i in the input at which the matching of this production began: the origin position
Sounds an awful lot like the description of a derivative:
"The derivative of a pattern p with respect to a node x is a pattern for what's left of p after matching x;"

The portion after the dot IS a derivative, right? Are there subtle differences I’m missing? I’m thinking that Rust, with it’s nice Haskell-like “data class” enums, might be able to adapt some of the logic from [1] to make a very clean, readable implementation.

Welcome any thoughts from those more technically competent, which amounts to pretty much everyone on this list. 
-Joel



[1] https://relaxng.org/jclark/derivative.html <https://relaxng.org/jclark/derivative.html>

Received on Thursday, 14 July 2022 04:12:43 UTC