- From: Rick Jelliffe <ricko@gate.sinica.edu.tw>
- Date: Mon, 31 May 1999 18:27:28 +0800
- To: <html-tidy@w3.org>
- Message-ID: <007501beab50$2f25eaa0$dd066d8c@sinica.edu.tw>
A version of tidy with some enhancements for Chinese is at http://www.ascc.net/xml/en/utf-8/resource_index.html#source (Sorry, I set the file permissions wrong last time!) The enhancements are: * support Big5 and ShiftJIS (faking it a little) * support different numeric character reference behaviour * added a language parameter to the config file * support "Chinese" line breaks (also useful for other languages which are written with no line breaks). * documentation adjusted. Rick Jelliffe P.S., People might be interested in some experimental code we have here called XXX (eXperimental Xml leXer). I have made a generic lexical routine that is driven by parameters: the routine correspond to a state and the parameters correspond to transitions (and handlers). Potentially, the same lexical routine could parse JavaScript and CSS, with the addition of the correct tables. There are also a couple of accompanying articles. See http://www.ascc.net/xml/en/utf-8/xxxmodel.html This might be a good approach in the future for keeping application sizes down: with (X)HTML we have multiple notations that all need parsing into a parse tree. It might be worthwhile to have a slim and conbined routine rather than completely seperate parsers for everything.
Received on Monday, 31 May 1999 06:38:48 UTC