Re: Inline text/html

Ingo Macherius (ingo.macherius@mwe.hvr.scn.de)
Thu, 12 Sep 1996 11:32:07 +0200 (MDT)


From: Ingo Macherius <ingo.macherius@mwe.hvr.scn.de>
Message-Id: <199609120932.LAA03493@ESAMX6.mwe.hvr.scn.de>
Subject: Re: Inline text/html
To: ehood@isogen.com (Earl Hood)
Date: Thu, 12 Sep 1996 11:32:07 +0200 (MDT)
Cc: www-html@w3.org
In-Reply-To: <199609101332.IAA06440@bonk.isogen.com> from "Earl Hood" at Sep 10, 96 08:32:53 am


Earl Hood <ehood@isogen.com> said:

> > > I don't think many people realize inlining HTML documents can be done
> > > by <OBJECT>, or even <IMG>. That is, complete documents, no server-side
> > > look-a-likes.  If one can put a gif/jpg/movie/java-thingy somewhere,
> > > why not an HTML document?  Of course, no browser has implemented it
> > > yet, but considering frames are implemented, inline HTML documents
> > > with <OBJECT> shouldn't be too difficult.
> >
> > This is why I suggested this. When I found there was no effective
> > 'include' function in HTML, well...

> There is an "include" function if browser implementors would
> eventually decide to implement some of the more useful aspects of
> SGML.  As suggested years ago, the SUBDOC entity construct provides
> what you require:
> 
> <!DOCTYPE HTML [
>     <!ENTITY otherdoc SYSTEM "http://foo.org/doc.html" SUBDOC>
> ]>
> <html>
> <head><title>Title</title>
> </head>
> <body>
> <H1>Heading<h1>
> <p>blah blah blah
> </p>
> &otherdoc;
> ...
>

This would include a full HTML document, including <HTML><HEAD><BODY>
sections which are clearly illegal at this point.
Even if you leave them out a conforming SGML application would insert 
them doing the omittag rules. In my feeling SUBDOC is a great idea
to apply to HTML but in the moment I don't see a way to do it legally.

Using non-subdoc entities is a solution, but it raises the problem
of entities that are not balanced in the way that they may open
/close tags they did not close/open themself. I think this is a very 
common case, as most includes are headers/footers. SUBDOC would forbid
this as it requires the included document to parse ok to it's own
DTD.

The W3C HTML DTDs already use parameter entities to simplify the
notation of content models. Why not assign those models
a document type and describe them in an own DTD ? I remember there was
an effort to modularize the HTML DTD (by Murray Altheim) which is suspended
now. Why ????

A good starting point are the containers SPAN, P and DIV. This is what
the Cougar DTD says about the content models:
DIV : %body.content

<!ENTITY % block
     "P | %list | %preformatted | DL | DIV | CENTER |
      BLOCKQUOTE | FORM | ISINDEX | HR | TABLE | FIELDSET">  
<!ENTITY % heading "H1|H2|H3|H4|H5|H6">
<!ENTITY % text "#PCDATA | %font | %phrase | %special | %form">
<!ENTITY % special
   "A|IMG|APPLET|OBJECT|FONT|BASEFONT|BR|SCRIPT|STYLE|MAP|SPAN|BDO">
<!ENTITY % body.content "(%heading | %text | %block | ADDRESS)*">      
 
<!ELEMENT DIV - - %body.content>         
<!ELEMENT SPAN - - (%text)*     -- generic language/style container -->  
<!ELEMENT P     - O (%text)*>

So DIV has is equivalent to body. SPAN and P are about just the same
with the exception of different rendering semantics and omittag behaviour.
So I drop P as a special case of SPAN.

What remains are two models: 

1) DIV
Here structural ('blocking') tags like H1..H6, BLOCKQUOTE and TABLE are 
allowed. DIVs can be nested and may contain SPANs.
2) SPAN
This is more limited than DIV, mostly to phrase and font level tags.
[Meta-Question: Why forms are allowed ? They appear in %block, so why
are they considered to be non-blocking elements here?]
SPAN can also be nested but can't contain DIVs.

Assigning something like
<!DOCTYPE SPAN PUBLIC "-//W3C//DTD HTML 3.x SPAN//EN">
<!DOCTYPE DIV PUBLIC "-//W3C//DTD HTML 3.x DIV//EN">
to them (and of course have the DTDs external :-) would yield two nice
inclusion models:

1) DIV
For composing structured documents 
2) SPAN
For including marked up 'payload' text. Notice that these includes
can't break the document structure as the structuring H1..H6, BLOCKQUOTE
etc. are not allowed.

Having those handy it's easy to write HTML docs which are seperated in
several files using SUBDOC as suggested. 

<!DOCTYPE HTML [
    <!ENTITY section1 SYSTEM "http://foo.org/section1.html"
	 SUBDOC -- of type DIV -->
    <!ENTITY section2 SYSTEM "http://foo.org/section2.html" 
	SUBDOC -- of type DIV -->
    <!ENTITY payload SYSTEM "http://foo.org/payload.html" 
	SUBDOC -- of type SPAN -->
]>
<html>
<head><title>Title</title>
</head>
<body>
&section1;
&section2;
<h1>something</h1>
<div>&payload;</div>
</body></html>

It may be necessary to move some elements, e.g. exclude FORM from SPAN
and exclude SPAN from it's own content model. But after all this would
yield a reasonable solution to the html include FAQ.

Side note: I consider 'browsers don't support entities' as a killer argument
that really hinders HTML development. Why always orient on the weakest
part of HTML systems ? IMHO a clear discrimination between server side
and client side HTML is necessary (sgml-lex aims in that direction).

Virtually yours,
Ingo
-- 
Campus:  Ingo.Macherius@tu-clausthal.de      http://www.tu-clausthal.de/~inim
Siemens: Ingo.Macherius@mwe.hvr.scn.de       http://www.scn.de/~inim
 information != knowledge != wisdom != truth != beauty != music == best (FZ)