Re: HTML Streaming

At 01:42 PM 9/5/97 -0400, wrote:
> (Dan Sugalski) wrote:
>>If you're going to maintain any sort of backward compatibility with current
>>HTML, I think you'll find that any error-free description of the data on a
>>page won't be much, if any, smaller than the actual data itself. Also,
>A text description could be compressed. It would probably be a series of
>numbers. If for example, a pattern repeats such as a row of three ones you 
>can represent it with a single number. Their are other ways to compress the 
>description. Remember, this is being done by the HTML editor.

You're not going to get significant enough compression to make it
worthwile. Adding even 30% overhead to a web page to make it render faster
is not worth it, and I think you'll find it difficult to get much smaller
than that and still stay within the limits of a less-than-seven-bit
character set. (Honestly, since you're looking for accurate rendering,
you're going to have to have an intimate knowledge of the font metrics used
to render the page, and you just can't have that at page-generation time. A
truly accurate mockup of the page is an impossibility, alas)

I think the assumption that this is going to be done exclusively by HTML
editors is something that's going to have to get jettisoned. If you work on
that assumption then you might as well pack it in now, since you'll never
get a significant enough user-base to get any of the significant browser
makers to bother with it.

Also, from what I've seen you're counting on tools and info that's not
easily available to the majority of the engines generating web pages, i.e.
CGI scripts and database-driven pages.

>>since you're going to need to use font metrics to accurately make those
>>descriptions, you're going to end up with an inaccurate description for a
>>significant segment of the clients viewing your page.
>Font degradability is a big problem. It is almost a printing problem. This is
>really not my field. I think panose is the best solution. A base font could 
>solve other problems. A database with a basic description of fonts would also
>help. These solutions would not be complete but not as inaccurate as you say.

Actually, it *is* a printing problem, and it would be as inaccurate as I
say. Worse, really. When printing, you can make some very valid page size
assumptions (8.5x11 or A4 paper) which you can't make for browsers. A web
page in 12 point times is going to have a significantly different layout in
a 400x400 window and a 1200x900 one.

That pretty much leaves you describing individual words and the rectangle
they take. English has an average word size less than six characters. Do
you really think you can do it in an average less than 2? (Don't forget
you'll need both width and height because you can't assume that font
metrics won't change from word to word, or even character to character)

And you did specify you were shooting for lossless, and you just can't have
that. Now, if you shoot for 'mostly accurate' and take a good guess at
standard Times metrics, your assumption will probably be valid, or close
enough for several varieties of times on different platforms, for most
(60-80%) of the people viewing the page. OTOH, 20-40% of the people *won't*
be able to use your assumptions, thus incurring the overhead of downloading
a data description they can't use.

>Of course, this only describes regular text. What about weird things like 
>HTML math? I am working on a global description of a text. The main problem 
>seems to be that it breaks at a variety of points. It really becomes a 
>description of a series of squares and rectangles. The descriptions is not 
>meant to only describe text. So I am considering other ways of describing a 
>variety of data. I am sorry if it seems incomplete but I am working on it 
>right now.

By all means, work it out, it's a good exercise. I think, unfortunately,
you'll find that the increase in size your additions make will make will
end up slowing down the ultimate display of the page enough to make it


----------------------------------------"it's like this"-------------------
Dan Sugalski   (541) 917-4364           even samurai
Programmer/SysAdmin                     have teddy bears
Linn-Benton Community College           and even the teddy bears                   get drunk

Received on Friday, 5 September 1997 14:53:55 UTC