W3C home > Mailing lists > Public > html-tidy@w3.org > April to June 2001

RE: SourceForge Project Approved

From: Richard A. O'Keefe <ok@atlas.otago.ac.nz>
Date: Mon, 28 May 2001 12:22:28 +1200 (NZST)
Message-Id: <200105280022.MAA447467@atlas.otago.ac.nz>
To: Valeri.Atamaniouk@nokia.com, html-tidy@w3.org, ok@atlas.otago.ac.nz
Valeri.Atamaniouk@nokia.com wrote:

	Tidy operates with 32-bit characters :).

Yes, and the W3C DOM ***forbids*** that.

	And it actually consumes more memory but I believe this is the
	fastest solution (ANSI C requires int to be the fastest integral
	type of at least 16bit length).

Since neither C89 nor C99 requires int to be 32 bits wide, this is
irrelevant to a discussion of 32-bit characters.

	Unfortunately I think it is better not to use standard library
	functions:  they may have support for Unicode (UTF-32/UCS4) but

My point exactly, but misunderstood.  They *may*, but they *might not*.
My point was not that wchar_t is a 32-bit type (in many C systems it is
only an 8-bit type) but simply that it is not guaranteed to be a 16-bit
type, which is what the W3C DOM requires.

Tidy operates in an enviroment with "something" on ASCII subset.  It is
better because allow to process a lot of encodings with a minimal
effort.

I wrote:
	> Once again, requiring Tidy-as-a-library to use some sort of 
	> library-level
	> exception handling interface would make it of very little use to C
	> programmers trying to use it.  It would make life *more* 
	> complicated for
	> them, not less.  (It's different in a language with 
	> language-level exceptions.)
	> To do this just so one could conform to a deeply flawed 
	> interface designed for
	> other purposes entirely strikes me as, um, perverse.

Valeri.Atamaniouk@nokia.com wrote:

	Just an example:  is you process something and in
	some_very_deep_function you encounter a problem, you'll have to
	return a error code, check it in the higher-level function,
	return error code etc.  At the end you'll have to release all
	the memory allocated (supposing that structure's integrity is not
	destroyed).

It's not clear what the point of this reply is.  I think we can reasonably
take in that most people reading this list know what exception handling is
for.  I'm sure Dave Raggett does.

The thing is, how is this relevant to HTML Tidy?  The whole point of Tidy
is to *be* the error handling code.  Well, I exaggerate.  But its job is
to take sows' ears and turn them into silk purses, from messy input to
create a clean data structure.

	Another approach: you create a memory allocator and start processing. All
	functions get their memory from this allocator. If there is a error, an
	exception is emulated (longjmp would be fine) to the starting point. Having
	it we just release allocator (and all the memory wich was allocated from it
	in one operation). This makes unnessesary numerous error checks. Besides
	mechanism with preallocation of a chunk of a memory and then distributing it
	as needed works faster, than standard malloc/realloc/free. You will be
	suprised how much faster.
	
Not really.  I have been allocating memory in great big blocks and running
a simple suballocator for hmm, the last 20 years.  Anyone who is interested
will find the technique expounded in Donald Knuth's book "The Stanford
GraphBase".  For what it's worth, that is a collection of graph algorithms
with data structure doing very complex things, with no general exception-
handling machinery in sight.

The fact remains that the W3C DOM is framed in terms of exception handling
(although strangely, some obvious erroneous operations, like creating a
comment with an embedded "--", are *forbidden* to raise an exception) and
there is no "standard model" or consensus for how to translate that to C.
Converting Tidy to use the W3C DOM would not provide any portability
benefits to anyone whatsoever.

The story is very different for JTidy, except that Java programmers like
the W3C DOM so much that they are flocking to an unfinished rival called JDOM.
Received on Sunday, 27 May 2001 20:22:56 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 3 April 2012 06:13:45 GMT