- From: Michel Fortin <michel.fortin@michelf.com>
- Date: Sun, 3 Dec 2006 20:10:34 -0500
Le 3 d?c. 2006 ? 17:04, J. King a ?crit : > I am. It's not anywhere near finished yet, but the parser so far > goes through the whole document and spits out the appropriate > tokens; I just haven't done anything with said tokens yet, mainly > because I was discouraged by PHP's DOM implementation. > My parser is also slow as molasses, unfortunately. My experience optimizing PHP Markdown, and building the custom mixed Markdown/HTML-block pesudo-tokenizer of PHP Markdown Extra, tells me that it'll probably stay very slow as long as the implementation is made of PHP code. Assuming you've implemented the algorithm in the spec as PHP code, you could probably make it faster by using regular expressions in the tokenization steps instead of iterating character by character. For instance, you could implement many of the tokenizer states by matching from the start of a string with a regex. And maybe then it'll also be possible to combine a couple of states within the same regex too. The more we replace PHP code by regular expressions, the faster it'll go, but further we deviate from the processing algorithm described in the spec. I wonder how far we could go while keeping the exact same behaviour. The true good solution would be to have a parser implemented in C and available through every standard installation of PHP. It could be used by other languages too. Michel Fortin michel.fortin at michelf.com http://www.michelf.com/
Received on Sunday, 3 December 2006 17:10:34 UTC