Tidy in Chinese

A version of tidy with some enhancements for Chinese is at
http://www.ascc.net/xml/en/utf-8/resource_index.html#source

(Sorry, I set the file permissions wrong last time!)

The enhancements are:

* support Big5 and ShiftJIS (faking it a little)
* support different numeric character reference behaviour
* added a language parameter to the config file
* support "Chinese" line breaks (also useful for other
   languages which are written with no line breaks).
* documentation adjusted.


Rick Jelliffe

P.S., People might be interested in some experimental code we have here called XXX (eXperimental Xml  leXer). I have made a generic lexical routine that is driven by parameters: the routine correspond to a state and the parameters correspond to transitions (and handlers). Potentially, the same lexical routine could parse JavaScript and CSS, with the addition of the correct tables. 
 
There are also a couple of accompanying articles. See http://www.ascc.net/xml/en/utf-8/xxxmodel.html 
 
This might be a good approach in the future for keeping application sizes down: with (X)HTML we have multiple notations that all need parsing into a parse tree. It might be worthwhile to have a slim and conbined routine rather than completely seperate parsers for everything.

Received on Monday, 31 May 1999 06:38:48 UTC