- From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
- Date: Thu, 17 Mar 2022 16:27:48 -0600
- To: Dimitre Novatchev <dnovatchev@gmail.com>
- Cc: Norm Tovey-Walsh <norm@saxonica.com>, public-ixml@w3.org
Dimitre Novatchev writes: > Sorry to interrupt your group's discussions and probably show lack of > understanding. That would, I think, be very difficult for you to do. Thank you for your intervention. > But 30 years ago I had a post-doc scholarship with Masaru Tomita at CMU and > at that time implemented in C his algorithm for NL parsing -- GLR > (Generalized LR Parsing). > This parsing algorithm can handle any NL grammar ambiguities, it even > produced all possible parses for "Time flies like an arrow", and one of > these interpreted "time-flies" as the subject. So, "time-flies" like (love) > an arrow :) > Maybe you could consider implementing and using Tomita's GLR? That may be a good idea. As Tom has mentioned, the spec does refer to GLR as a possible implementation strategy (which is a change from the earliest descriptions of invisible XML, which always mentioned Earley parsing), but as far as I know none of the implementations now being worked on uses Tomita / GLR parsing. In considering your mail, I discover that I seem to feel a little guilty about this, so I have spent a little time just now asking myself why this is so. It's not that I haven't heard of GLR parsing. On the contrary: somewhere in one of the stacks of partly-read and partly-understood papers gradually taking over my study there are some recent papers on variations and refinements of Tomita's parser. The day may come when I finish them, understand the algorithm better, and try to implement it. So far, though, I have not done so. Maybe it's just that I started by studying Earley's algorithm and have had my hands full with other things. Maybe it's that GLR seems still not to be widely understood. For what it's worth, though, I have the impression that GLR is harder to understand than Earley -- although I won't claim that Earley is easy to understand. To me it felt like very hard work when I was trying to understand why I should trust the Earley algorithm and how to express it in declarative terms. When reading the papers on GLR, I lost heart at the point where I saw that the key to the generality of the GLR method involves something that looked at first glance like simulating a parallel processor in the parser and maintaining multiple parsing stacks in parallel. Apart from the conceptual complexity, it did not look at first glance like a good match for a purely functional language like XQuery or XSLT. Like GLR, Earley parsing can handle ambiguity, so if (as I believe) both Norm's program and Steven's program are imlementations of the Earley algorithm, then in principle neither of them was doomed to crash on this input. Without having seen the internals of either, I will risk a guess that the problems they encountered lie not in the recognition of the sentence (or: in the construction of the chart) but in the extraction of the parse tree from the chart. It is at that point that any failure to detect a loop is likely to lead to a runaway process. And that is also the point at which Earley waves his hands and essentially leaves the extraction of the parse tree form as an exercise for the reader ... One reason for being interested in GLR parsing is the possibility that on many grammars it may be faster than Earley parsing. Earley parsing is known to be cubic in the worst case, but is said to be linear for any grammar for which a deterministic parsing method would work. Grune and Jacobs say, however, in their book Parsing Techniques, that if any of the general parsers [i.e. Unger's, Earley's, GLR] performs in linear time, it may still be a factor of ten or so slower than a deterministic method, due to the much heavier administration they need. They also observe that GLR is harder to implement than Earley. The crucial point, is probably one Tom made already: new implementations of ixml are always welcome! Michael -- C. M. Sperberg-McQueen Black Mesa Technologies LLC http://blackmesatech.com
Received on Thursday, 17 March 2022 22:28:10 UTC