- From: Ingo Macherius <Ingo.Macherius@tu-clausthal.de>
- Date: Tue, 4 Feb 1997 18:51:26 +0100 (MET)
- To: linas@fc.net
- Cc: www-html@w3.org
Linas, > Maybe I'm an idiot, but I don't understand how <DIV><SPAN><OBJECT> > will allow me to embed non-html text in such a way that it won't get > mangled by the wysiwyg's. It furthermore doesn't address the question > of "what should sort of html preporcessor should be standardized on?" Non-HTML text can be formatted without <PRE> and friends using the CSS1 clause: 5.6.3 white-space normal | pre | nowrap Applies to: block-level elements Initial: according to HTML so assigning an ID or CLASS to the DIV and then setting white-space : pre will do. XML offers similar controls for collapsing/not collapsing white- space, which is the main thing with with preformatted text. As I said before, I am just looking at HTML as a display language. My thesis contains a comparison of HTML and 'real' SGML Hypermedia languages like DocBook, HP OpenBook or IBMIDDoc. These are some results: 1) The number of elements in HTML is much lower (about 100 in Cougar compared to 300 in DocBook 2.4.1) than with the other languages. Compare: LaTeX has also about 300 macro calls. 2) "Cougar" contains more attributes than tags, while other languages have about 2 times more tags than attributes. Blech. 3) The philosophy of SGML is to structure documents throgh a containment relation, HTML's <H1>..<H6> are not block-building, so you have to group blocks with <P> or the generic DIV container. It is more practical to assign different CLASS attribues to DIV sections and make up your own semantics on what they stand for, than to use the "official" tags. Furthermore we are looking at HTML in different ways: You tend to see a text file, I only work with HTML after the document has been converted to ISO 8879 ESIS events or ISO 10179 groves. HTML text files are a mere transport encoding for tree-like structures, not the real thing. I concentrated on HTML as an output format for synthesizing processes, like the ISO 10179 tree formatting process. IMHO the problems with WYSIWYG editors are quite similar, also this was not my main topic. I described five possible formatting models: 1) Formatting with tags and attributes (the way it's done today) 2) Formatting with CSS1 and generic containers 3) Formatting with frames 4) Formatting with Netscape's new <LAYER> tag 5) Formatting with DSSSL DSSSL is far the best, but it is not supported by todays browsers. To produce documents today and reuse them, after a WYSIWYG-able formatting model has been standardisized (like WD-positioning and WD-CSS1 together with frames and layers could move toward), one has to keep the number of tags small. So I decided to only use these four: DIV, SPAN, A and OBJECT. Of course they are post-processed to produce documents for NS-Navigator, but these are synthetic and not the thing an author produces. > We are busy inventing a preprocessor, which is already 5 or 10kloc of > C++ code. It strikes me that anyone doing sophisticated cgi-bin > scripts which generate html on-the-fly have to invent thier own > preprocessor which does more-or-less the same thing. Wouldn't it > be nice to standardize on a powerful, generic pre-processor, in the > same way everyone agrees that cpp and its markup (#define, #ifdef, etc) > is the pre-processor for C code? You mentioned cpp, I would add m4. These are existing standards for simple string oriented manipulation tools. I don't think we need anything else. The problem with manipulating HTML by string-processors is, that while inserting text fragments that contain markup themselves, a conforming ISO 8879 parser/entity manager has to do a rewind. This means, it has to re-organize the element structure of already parsed document sections. The ability to rewind is missing from any tool, that works on string level instead of being structure controlled. A macro processor, that implements rewinds is contained in SP (http://www.jclark.com/sp/) and has a standardisized syntax (ISO 8879 entity), so it does not have to be invented or agreed on. It already exists and is doing very well. What is missing is an Apache (or some other HTTPD) module, which offers a conforming ISO 8879 processing. This could be done by writing mod_SP, reusing the free code by James Clark. I personally have no time to do something like this right now, but iff I manage to get a job which allows me to do my Ph.D., such a thing is high priority on my TODO list. Electronic Books Technologies (http://www.ebt.com) and others are working on on-the-fly formatters for SGML embedded in HTTPDs. IMHO this is the right direction, CGI is an obsolete technique. Virtually yours, Ingo BTW: I offered copies of my thesis on this list. My professor stopped me because we are trying to have a publication first. Siemens AG is checking the commerical use of the results. So I have to redraw this offer :-( -- http://www.tu-clausthal.de/~inim/
Received on Tuesday, 4 February 1997 12:51:52 UTC