- From: Richard A. O'Keefe <ok@atlas.otago.ac.nz>
- Date: Mon, 28 May 2001 12:22:28 +1200 (NZST)
- To: Valeri.Atamaniouk@nokia.com, html-tidy@w3.org, ok@atlas.otago.ac.nz
Valeri.Atamaniouk@nokia.com wrote: Tidy operates with 32-bit characters :). Yes, and the W3C DOM ***forbids*** that. And it actually consumes more memory but I believe this is the fastest solution (ANSI C requires int to be the fastest integral type of at least 16bit length). Since neither C89 nor C99 requires int to be 32 bits wide, this is irrelevant to a discussion of 32-bit characters. Unfortunately I think it is better not to use standard library functions: they may have support for Unicode (UTF-32/UCS4) but My point exactly, but misunderstood. They *may*, but they *might not*. My point was not that wchar_t is a 32-bit type (in many C systems it is only an 8-bit type) but simply that it is not guaranteed to be a 16-bit type, which is what the W3C DOM requires. Tidy operates in an enviroment with "something" on ASCII subset. It is better because allow to process a lot of encodings with a minimal effort. I wrote: > Once again, requiring Tidy-as-a-library to use some sort of > library-level > exception handling interface would make it of very little use to C > programmers trying to use it. It would make life *more* > complicated for > them, not less. (It's different in a language with > language-level exceptions.) > To do this just so one could conform to a deeply flawed > interface designed for > other purposes entirely strikes me as, um, perverse. Valeri.Atamaniouk@nokia.com wrote: Just an example: is you process something and in some_very_deep_function you encounter a problem, you'll have to return a error code, check it in the higher-level function, return error code etc. At the end you'll have to release all the memory allocated (supposing that structure's integrity is not destroyed). It's not clear what the point of this reply is. I think we can reasonably take in that most people reading this list know what exception handling is for. I'm sure Dave Raggett does. The thing is, how is this relevant to HTML Tidy? The whole point of Tidy is to *be* the error handling code. Well, I exaggerate. But its job is to take sows' ears and turn them into silk purses, from messy input to create a clean data structure. Another approach: you create a memory allocator and start processing. All functions get their memory from this allocator. If there is a error, an exception is emulated (longjmp would be fine) to the starting point. Having it we just release allocator (and all the memory wich was allocated from it in one operation). This makes unnessesary numerous error checks. Besides mechanism with preallocation of a chunk of a memory and then distributing it as needed works faster, than standard malloc/realloc/free. You will be suprised how much faster. Not really. I have been allocating memory in great big blocks and running a simple suballocator for hmm, the last 20 years. Anyone who is interested will find the technique expounded in Donald Knuth's book "The Stanford GraphBase". For what it's worth, that is a collection of graph algorithms with data structure doing very complex things, with no general exception- handling machinery in sight. The fact remains that the W3C DOM is framed in terms of exception handling (although strangely, some obvious erroneous operations, like creating a comment with an embedded "--", are *forbidden* to raise an exception) and there is no "standard model" or consensus for how to translate that to C. Converting Tidy to use the W3C DOM would not provide any portability benefits to anyone whatsoever. The story is very different for JTidy, except that Java programmers like the W3C DOM so much that they are flocking to an unfinished rival called JDOM.
Received on Sunday, 27 May 2001 20:22:56 UTC