- From: Reitzel, Charlie <CReitzel@arrakisplanet.com>
- Date: Tue, 29 May 2001 11:57:18 -0400
- To: "'Richard A. O'Keefe'" <ok@atlas.otago.ac.nz>, Valeri.Atamaniouk@nokia.com, html-tidy@w3.org
Hey guys, why don't we take this over to Source Forge where it will do some good? I think you are both capturing some important points for the library to address. If things go well this summer, we may have the opportunity to confront these issues in the fall. My digs: Many C compilers (all PC implementations) implement a decent suballocator within malloc/free. Some Unix compilers may not, yet. I'm not sure. Tidy already uses a wrapper function for memory allocation. On platforms that need it, there is no reason why we can't call a sub-allocator. I think setjmp/longjmp are a last resort. I'm glad to see agreement about _not_ exposing these to the public interface. If need be, this can be implemented later w/out breaking application code. Valeri, I think you identified the thread-safety problem exactly. I would rather see a clean, exception-safe coding style used. I.e. would shouldn't leave data structures in a munged state if an error is encountered. Thus, cleanup will alway be possible. This goes hand-in-hand with a "no globals" design. Thus, free-ing a Tidy document object will release all related resources. For the time being, I'd rather see the discussion focus on what the public interface for error handling should look like. I think the application should control the lifetime of the document objects. For example, it may be desireable to keep the object around to report the error, including error message(s) and locator(s) into the input document. More on SF. take it easy, Charlie -----Original Message----- From: Richard A. O'Keefe [mailto:ok@atlas.otago.ac.nz] Sent: Sunday, May 27, 2001 8:22 PM To: Valeri.Atamaniouk@nokia.com; html-tidy@w3.org; ok@atlas.otago.ac.nz Subject: RE: SourceForge Project Approved Valeri.Atamaniouk@nokia.com wrote: Tidy operates with 32-bit characters :). Yes, and the W3C DOM ***forbids*** that. And it actually consumes more memory but I believe this is the fastest solution (ANSI C requires int to be the fastest integral type of at least 16bit length). Since neither C89 nor C99 requires int to be 32 bits wide, this is irrelevant to a discussion of 32-bit characters. Unfortunately I think it is better not to use standard library functions: they may have support for Unicode (UTF-32/UCS4) but My point exactly, but misunderstood. They *may*, but they *might not*. My point was not that wchar_t is a 32-bit type (in many C systems it is only an 8-bit type) but simply that it is not guaranteed to be a 16-bit type, which is what the W3C DOM requires. Tidy operates in an enviroment with "something" on ASCII subset. It is better because allow to process a lot of encodings with a minimal effort. I wrote: > Once again, requiring Tidy-as-a-library to use some sort of > library-level > exception handling interface would make it of very little use to C > programmers trying to use it. It would make life *more* > complicated for > them, not less. (It's different in a language with > language-level exceptions.) > To do this just so one could conform to a deeply flawed > interface designed for > other purposes entirely strikes me as, um, perverse. Valeri.Atamaniouk@nokia.com wrote: Just an example: is you process something and in some_very_deep_function you encounter a problem, you'll have to return a error code, check it in the higher-level function, return error code etc. At the end you'll have to release all the memory allocated (supposing that structure's integrity is not destroyed). It's not clear what the point of this reply is. I think we can reasonably take in that most people reading this list know what exception handling is for. I'm sure Dave Raggett does. The thing is, how is this relevant to HTML Tidy? The whole point of Tidy is to *be* the error handling code. Well, I exaggerate. But its job is to take sows' ears and turn them into silk purses, from messy input to create a clean data structure. Another approach: you create a memory allocator and start processing. All functions get their memory from this allocator. If there is a error, an exception is emulated (longjmp would be fine) to the starting point. Having it we just release allocator (and all the memory wich was allocated from it in one operation). This makes unnessesary numerous error checks. Besides mechanism with preallocation of a chunk of a memory and then distributing it as needed works faster, than standard malloc/realloc/free. You will be suprised how much faster. Not really. I have been allocating memory in great big blocks and running a simple suballocator for hmm, the last 20 years. Anyone who is interested will find the technique expounded in Donald Knuth's book "The Stanford GraphBase". For what it's worth, that is a collection of graph algorithms with data structure doing very complex things, with no general exception- handling machinery in sight. The fact remains that the W3C DOM is framed in terms of exception handling (although strangely, some obvious erroneous operations, like creating a comment with an embedded "--", are *forbidden* to raise an exception) and there is no "standard model" or consensus for how to translate that to C. Converting Tidy to use the W3C DOM would not provide any portability benefits to anyone whatsoever. The story is very different for JTidy, except that Java programmers like the W3C DOM so much that they are flocking to an unfinished rival called JDOM.
Received on Tuesday, 29 May 2001 11:57:43 UTC