Re: Auto-Generated HTML and Authoring Tools

Ingo Macherius (Ingo.Macherius@tu-clausthal.de)
Tue, 4 Feb 1997 18:51:26 +0100 (MET)


Message-Id: <199702041751.SAA19152@verleihnix.rz.tu-clausthal.de>
Subject: Re: Auto-Generated HTML and Authoring Tools
To: linas@fc.net
Date: Tue, 4 Feb 1997 18:51:26 +0100 (MET)
Cc: www-html@w3.org
In-Reply-To: <199701310729.IAA10965@majestix.rz.tu-clausthal.de> from "Ingo Macherius" at Jan 31, 97 08:27:20 am
From: Ingo Macherius <Ingo.Macherius@tu-clausthal.de>

Linas,

> Maybe I'm an idiot, but I don't understand how <DIV><SPAN><OBJECT>
> will allow me to embed non-html text in such a way that it won't get
> mangled by the wysiwyg's.  It furthermore doesn't address the question
> of "what should sort of html preporcessor should be standardized on?"

Non-HTML text can be formatted without <PRE> and friends using the CSS1
clause:
5.6.3 white-space
     normal | pre | nowrap
     Applies to: block-level elements
     Initial: according to HTML
so assigning an ID or CLASS to the DIV and then setting white-space : pre
will do. XML offers similar controls for collapsing/not collapsing white-
space, which is the main thing with with preformatted text.
As I said before, I am just looking at HTML as a display language.
My thesis contains a comparison of HTML and 'real' SGML Hypermedia languages
like DocBook, HP OpenBook or IBMIDDoc. These are some results:
1) The number of elements in HTML is much lower (about 100 in Cougar compared
   to 300 in DocBook 2.4.1) than with the other languages. 
   Compare: LaTeX has also about 300 macro calls.
2) "Cougar" contains more attributes than tags, while other languages have
   about 2 times more tags than attributes. Blech.
3) The philosophy of SGML is to structure documents throgh a containment
   relation, HTML's <H1>..<H6> are not block-building, so you have to group
   blocks with <P> or the generic DIV container. It is more practical to assign
   different CLASS attribues to DIV sections and make up your own semantics
   on what they stand for, than to use the "official" tags.

Furthermore we are looking at HTML in different ways: You tend to see a
text file, I only work with HTML after the document has been converted to
ISO 8879 ESIS events or ISO 10179 groves. HTML text files are a mere transport
encoding for tree-like structures, not the real thing.
I concentrated on HTML as an output format for synthesizing processes, like
the ISO 10179 tree formatting process. IMHO the problems with WYSIWYG editors
are quite similar, also this was not my main topic.

I described five possible formatting models:
1) Formatting with tags and attributes (the way it's done today)
2) Formatting with CSS1 and generic containers
3) Formatting with frames
4) Formatting with Netscape's new <LAYER> tag
5) Formatting with DSSSL

DSSSL is far the best, but it is not supported by todays browsers. To produce
documents today and reuse them, after a WYSIWYG-able formatting model has
been standardisized (like WD-positioning and WD-CSS1 together with frames
and layers could move toward), one has to keep the number of tags small.
So I decided to only use these four: DIV, SPAN, A and OBJECT. Of course they
are post-processed to produce documents for NS-Navigator, but these are 
synthetic and not the thing an author produces.

> We are busy inventing a preprocessor, which is already 5 or 10kloc of
> C++ code.  It strikes me that anyone doing sophisticated cgi-bin
> scripts which generate html on-the-fly have to invent thier own
> preprocessor which does more-or-less the same thing.  Wouldn't it 
> be nice to standardize on a powerful, generic pre-processor, in the 
> same way everyone agrees that cpp and its markup (#define, #ifdef, etc)
> is the pre-processor for C code?

You mentioned cpp, I would add m4. These are existing standards for simple
string oriented manipulation tools. I don't think we need anything else.
The problem with manipulating HTML by string-processors is, that while 
inserting text fragments that contain markup themselves, a conforming 
ISO 8879 parser/entity manager has to do a rewind. This means, it has to
re-organize the element structure of already parsed document sections.
The ability to rewind is missing from any tool, that works on string level
instead of being structure controlled. A macro processor, that implements
rewinds is contained in SP (http://www.jclark.com/sp/) and has a standardisized
syntax (ISO 8879 entity), so it does not have to be invented or agreed on. It
already exists and is doing very well.
What is missing is an Apache (or some other HTTPD) module, which offers a
conforming ISO 8879 processing. This could be done by writing mod_SP,
reusing the free code by James Clark. I personally have no time to do
something like this right now, but iff I manage to get a job which allows
me to do my Ph.D., such a thing is high priority on my TODO list.
Electronic Books Technologies (http://www.ebt.com) and others are working
on on-the-fly formatters for SGML embedded in HTTPDs. IMHO this is the right
direction, CGI is an obsolete technique.

Virtually yours,
Ingo

BTW: I offered copies of my thesis on this list. My professor stopped me
because we are trying to have a publication first. Siemens AG is
checking the commerical use of the results. So I have to redraw this offer :-(
--
http://www.tu-clausthal.de/~inim/