W3C home > Mailing lists > Public > html-tidy@w3.org > April to June 1999

Tidy in Chinese

From: Rick Jelliffe <ricko@gate.sinica.edu.tw>
Date: Mon, 31 May 1999 18:27:28 +0800
Message-ID: <007501beab50$2f25eaa0$dd066d8c@sinica.edu.tw>
To: <html-tidy@w3.org>
A version of tidy with some enhancements for Chinese is at
http://www.ascc.net/xml/en/utf-8/resource_index.html#source

(Sorry, I set the file permissions wrong last time!)

The enhancements are:

* support Big5 and ShiftJIS (faking it a little)
* support different numeric character reference behaviour
* added a language parameter to the config file
* support "Chinese" line breaks (also useful for other
   languages which are written with no line breaks).
* documentation adjusted.


Rick Jelliffe

P.S., People might be interested in some experimental code we have here called XXX (eXperimental Xml  leXer). I have made a generic lexical routine that is driven by parameters: the routine correspond to a state and the parameters correspond to transitions (and handlers). Potentially, the same lexical routine could parse JavaScript and CSS, with the addition of the correct tables. 
 
There are also a couple of accompanying articles. See http://www.ascc.net/xml/en/utf-8/xxxmodel.html 
 
This might be a good approach in the future for keeping application sizes down: with (X)HTML we have multiple notations that all need parsing into a parse tree. It might be worthwhile to have a slim and conbined routine rather than completely seperate parsers for everything.
Received on Monday, 31 May 1999 06:38:48 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 3 April 2012 06:13:42 GMT