- From: Edward Z. Yang <edwardzyang@thewritingpot.com>
- Date: Sat, 20 Dec 2008 23:32:01 -0500
I am currently working on a PHP5 implementation of the HTML5 specification. PHP has abysmal Unicode support, and implementing Unicode streams in userspace may be unacceptablu slow. Thus, my questions: 1. Given an input stream that is known to be valid UTF-8, is it possible to implement the tokenization algorithm with byte-wise operations only? I think it's possible, since all of the character matching parts of the algorithm map to characters in ASCII space. 2. Would such an implementation be conforming? Cheers, Edward
Received on Saturday, 20 December 2008 20:32:01 UTC