Re: HTML Streaming

Benjamin Franz (snowhare@netimages.com)
Mon, 1 Sep 1997 10:47:46 -0700 (PDT)


Date: Mon, 1 Sep 1997 10:47:46 -0700 (PDT)
From: Benjamin Franz <snowhare@netimages.com>
To: Albertfine@aol.com
cc: www-html@w3.org, crism@ora.com
In-Reply-To: <970901094452_-667551093@emout13.mail.aol.com>
Message-ID: <Pine.LNX.3.96.970901101720.13400B-100000@ns.viet.net>
Subject: Re: HTML Streaming

On Mon, 1 Sep 1997 Albertfine@aol.com wrote:
> <html>
> <head>
> <event p=200 table=75,25>
> </head>
> <body>
> <p>
> Imagine 200 character and spaces here
> </p>
> <table>
> Imagine a table that is 75 by 25
> </table>
> </body>
> </html>
> 
> The browser would first display a pre rendered page for a paragraph with 200 
> character and spaces and then a table that is 75 by 25. The browser would 
> then stream the character of the paragraph and cells of the table. Currently,
> the entire paragraph would have to be downloaded first. Then the entire
> table before it could be displayed.

Ok. It is *VERY* clear at this point that you have somehow gotten the idea
that everyone in the world uses mono-spaced fonts to render HTML normally
(ie that 'i' takes the same room as 'W' on everyones' screens).  You may
have gotten this misconception from a browser that you use that is
configured to use mono-spaced fonts for everything such as Lynx on a text
oriented terminal. It doesn't really matter where you got it from. 

What is important is that this fundamental assumption is _completely and
utterly wrong_. No more than 1 or 2 percent of browsers render in
mono-spaced fonts normally (I'm going to ignore the issue of the CJK
people for this - the point is still valid when you consider the mixing of
CJK with non-CJK text).

ON YOUR SCREEN:
WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW

is the same length as
.........................................................

ON MY SCREEN
WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW

may be the same length as
...................

or
.............................................

or
..................................................................
..................................................................
..................................................................
..................................................................

But is is highly probable that it *WILL NOT* be:
.........................................................

This means that saying a paragraph is '200 characters' tells me *NOTHING*
about how much screen territory it will take. It could consist of nothing
but '.' characters and fit into *one* line. It could consist of 'W'
characters and take 4 lines. It could have variable font metrics created
by stylesheet considerations. It could have usage of FONT that makes it
vary from even the font face initially declared *with no way to know in
advance until the FONT tag is parsed*. There could be embeded objects with
their own completely unrelated metrics. You could have a single string
with no whitespace that cannot be easily broken into multiple lines. You
may have every single character seperated by white space allowing line
breaking nearly anywhere.

You can't know *before the content comes down* whether some, all or none
of these problems apply to the text in question. The author in
particular has no way to know these in advance. Attempts to improve
layout time by providing hints of how many characters are in the text are
completely futile in nearly all cases.

-- 
Benjamin Franz

Hmmm...A thought just occured to me: The general problem of predicting web
page layout with HTML in advance is probably iso-morphic to the Turing
Stopping Problem. You can't know in general what it is going to do without
actually doing it. ;-)