W3C home > Mailing lists > Public > html-tidy@w3.org > April to June 2001

RE: SourceForge Project Approved

From: <Valeri.Atamaniouk@nokia.com>
Date: Fri, 25 May 2001 09:33:48 +0300
Message-ID: <DFC7E257BE53D4118A5400508B691A0001A8FFAE@eseis14nok>
To: ok@atlas.otago.ac.nz, html-tidy@w3.org
Hello Richard

> -----Original Message-----
> From: ext Richard A. O'Keefe [mailto:ok@atlas.otago.ac.nz]
> Sent: 24 May 2001 02:25
> To: Valeri.Atamaniouk@nokia.com; html-tidy@w3.org
> Subject: RE: SourceForge Project Approved
 
> 	and it is possible to implement strings, storage reclamation &
> 	exceptions.  As far as I understand the last two would be really
> 	usefull as definitevely improve performance (exit(2) is not an
> 	appropriate solution for library function :)).
> 
> Strings?  Yes, but the DOM explicitly requires immutable 
> *UNICODE* strings.
> Even if using UTF-8 internally would save you nearly a factor 
> of two in
> space, the DOM does not allow you to do that.  (Note that the 
> wchar_t type
> and wcs* strings in standard C don't help, because they are 
> commonly 32-bit
> characters, not 16-bit characters, which is what the DOM 
> absolutely demands.)

Tidy operates with 32-bit characters :). And it actually consumes more
memory but I believe this is the fastest solution (ANSI C requires int to be
the fastest integral type of at least 16bit length). Unfortunately I think
it is better not to use standard library functions: they may have support
for Unicode (UTF-32/UCS4) but Tidy operates in an enviroment with
"something" on ASCII subset. It is better because allow to process a lot of
encodings with a minimal effort.

> 
> Why should someone implement a 16-bit string library that Tidy doesn't
> need and wouldn't particularly benefit from, just because a 
> data structure
> that was designed for Javascript demands them?
> 
> The Boehm conservative garbage collector for C exists, is 
> freely available,
> has seen a lot of use, and is generally a Fine Thing.  
> However, it is a
> *conservative* garbage collector, that being pretty much the 
> best that is
> possible in C (where you can take a pointer, convert it to an integer,
> mangle the integer, and then days later demangle the integer, 
> convert it
> back to a pointer, and expect to be able to use the pointer 
> as if nothing
> had happened), and will on occasion leak space that could 
> have been reclaimed.
> It was a *major* piece of work.
> Tidy-as-a-program doesn't *need* garbage collection.

I do not talk about garbage collector :). See below.

> Tidy-as-a-library *will* need careful storage management 
> design, which is
> one reason why I'd like to see tidy-as-a-library wait until 
> known bug-fixes
> are installed and tested.  But if Tidy-as-a-library is to be usable in
> other people's C code, it had better not demand that *they* write for
> garbage collection too.
> 
> Exceptions:  there are a couple of versions around for C, one 
> of them comes
> as an example with FunnelWeb.  I've done ny own, too.  
> However, no-one in
> their right mind would say that library-level 
> exception-handling in C could
> be expeected to improve performance.
> 
> Once again, requiring Tidy-as-a-library to use some sort of 
> library-level
> exception handling interface would make it of very little use to C
> programmers trying to use it.  It would make life *more* 
> complicated for
> them, not less.  (It's different in a language with 
> language-level exceptions.)
> To do this just so one could conform to a deeply flawed 
> interface designed for
> other purposes entirely strikes me as, um, perverse.
> 

Just an example: is you process something and in some_very_deep_function you
encounter a problem, you'll have to return a error code, check it in the
higher-level function, return error code etc. At the end you'll have to
release all the memory allocated (supposing that structure's integrity is
not destroyed).

Another approach: you create a memory allocator and start processing. All
functions get their memory from this allocator. If there is a error, an
exception is emulated (longjmp would be fine) to the starting point. Having
it we just release allocator (and all the memory wich was allocated from it
in one operation). This makes unnessesary numerous error checks. Besides
mechanism with preallocation of a chunk of a memory and then distributing it
as needed works faster, than standard malloc/realloc/free. You will be
suprised how much faster.

BR
VA
Received on Friday, 25 May 2001 02:34:00 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 3 April 2012 06:13:45 GMT