Re: SGML entities

Paul Prescod (papresco@itrc.uwaterloo.ca)
Wed, 20 Mar 1996 05:36:47 -0500 (EST)


Date: Wed, 20 Mar 1996 05:36:47 -0500 (EST)
From: Paul Prescod <papresco@itrc.uwaterloo.ca>
To: "C. M. Sperberg-McQueen" <cmsmcq@uic.edu>
Cc: www-html@w3.org
Subject: Re: SGML entities
In-Reply-To: <9603192231.AA18080@www10.w3.org>
Message-Id: <Pine.SUN.3.91.960320043155.8686A-100000@itrc.uwaterloo.ca>

On Tue, 19 Mar 1996, C. M. Sperberg-McQueen wrote:

> I thought I made it clear that expansion of the entity reference would
> be handled by the server, not by the client.  In an ideal world, it
> might be nice to have it be done sometimes by the server, sometimes
> by the client.  But that seems hard to work into http now.

I'll reinforce a point someone else pointed out. In the woorld of 
HTML, many documents are being served by ancient servers. Perhaps if we 
require server-side entity references that situation would change, but 
history shows us that the browser side advances much more quickly than 
the server side. If we could get people to change their servers...geez, 
we could revolutionize the whole damn web. =) Anyhow, there is nothing 
that procludes you from developing a server that expands text entities 
appropriately. The question that started the thread was about client-side 
mechanisms.

> But then, if we do ship the DTD around, what happens?  Browsers which
> don't know what to do with it may not do the right thing with it.
> 
> This sounds rather similar to what happens if we use an <INSERT>
> element or any other element:  browsers which don't know what to do
> with them may not do the right thing with them.  

Theoretically this is the case.  But <INSERT> has already 
got vendor support (though I don't know if any <INSERT> enabled browsers 
are shipping) and I am convinced that part of the reason for that support 
is because <INSERT> is basically son-of-<IMG>.

> SGML entities are by no means restricted to SGML content.  (See example
> just given.)  They can as easily contain images or video.  This does not
> seem to be a reason to prefer one notation over the other.

I agree with you that ENTITY's are flexible enough for multimedia content 
(I certainly use them for those). But we seem to be combining discussion of 
three different kinds of entities, and I think we should discuss them 
separately.

The first includes inclusion of arbitrary HTML markup inside another 
document. My personal feeling is that if we can put off this discussion 
for a while, vendors will be closer to implementing full SGML-smart 
browsers, servers and editors and many of today's problems around them 
will go away. That's why _my_ answer to the question that started this 
thread was "EMBED HTML as you would any other data type." All they wanted 
was a toolbar, after all. SGML text entities would be a big sledghammer 
for a small fly.

The second kind of entity includes an arbitrary "other object". HTML did 
this through IMG first, and now through EMBED. Unfortunately, HTML's 
inclusion paradigm is now a few years old, and there is substantial 
author and tool support for it. Furthermore (this is where I get 
heretical), I don't know if HTML authors have anything to gain by moving 
to the SGML paradigm this late in the game. When browsers are "entity 
smart", HTML authors will have the _option_ of using entities for object 
inclusions, or using the short-and-sweet URL without a redirection 
through an entity. 

Most of the HTML authors I know simply would not understand why they 
should scroll to the top of the document to include an entity reference. 
And "HotDog", "NotePad", "WebEdit" and "SimpleText" are not going to help 
them. Since so many links and embeds are "one shot", and 
there are already several levels of redirection available, many authors 
will wonder why they need another.

The third kind of entity we are discussing is a SUBDOCument entity, which 
I think is just a special case of "multimedia object." I don't know why 
we should treat it any different.

> Yes.  That might be a reason to prefer client-side expansion of
> entity references.  On the other hand, any method one chooses of
> organizing data is apt to pessimize some caching scheme or other,
> under the right circumstances.  If the client does the expansion
> of the entity reference or INSERT, and the material inserted changes
> frequently, the copy in the cache is apt to be out of date.
> My copy of Netscape does not detect this:  it just happily shows
> me the outdated cached copy of changed documents until I force a
> reload, manually.

Either your Netscape is broken or your server is broken. This is clearly 
not how HTTP is supposed to work. Yes, I have observed this behaviour too.

>   But either way, this is an argument for doing
> expansion on the server or the client side, not an argument for
> inventing a new notation for existing SGML functionality.
> Or am I wrong?

You are right. But from a browser vendor's point of view, the EMBED 
notation is not very new. It is just a cut and paste of IMG/EMBED/APPLET/FIG 
code with extensions.

> Not necessarily:  a server does not have to be fully SGML compliant (or
> even fully SGML aware) to recognize and act appropriately on entity
> declarations and entity references.

Is that really the case? Aren't there RE/RS issues? Recursive entity 
replacement issues? Element recognition issues? Entity recognition 
issues? Can entity replacement and element recognition can be done by two 
separate tools without any knowledge of each other?

I'm really asking...I always use an SGML smart editor and parser before  
working with SGML documents. A newline in the wrong place can be too much of 
a headache. I ordered the SGML handbook a dog's age ago and am still 
waiting...

> (Although full SGML support would be a damn good thing for the Web:  for
> further discussion, see the paper Bob Goldstein and I wrote, at
> http://www.uic.edu/~cmsmcq/htmlmax.html.)

I don't think anyone here will argue with that. It's the unwashed hordes 
outside that we have to try to convince. =) 

> > The second is that HTML authors do not like "naming" things that
> > already have names. In other words, they do not like giving an SGML
> > entity name for something that already has a URL. Part of the
> > difference between the HTML community and other SGML DTD user grous
> > is that most HTML authors do not use "smart" authoring tools, and
> > SGML-smart authoring tools are especially rare.
> 
> This seems to me rather a large generalization, but even taken at face
> value I'm not sure it's an argument for reinventing yet another wheel.

It's only a reinvention from our point of view. From a typical HTML 
author's point of view, SGML entities are a reinvention. They understand 
IMG, and they understand INSERT as IMG-on-steroids. I think we lost this 
battle years ago. Yes, if we could go back and change IMG, I would do it.

> Unless, of course, you count the time it takes to persuade people that
> (a) the wheel exists, (b) it meets the specifications, and (c) it
> doesn't have to be rejected just because it was invented somewhere else.

_EXACTLY_! And if you think there is resistance to the idea _here_, wait 
until you take it to the hundreds of thousands of SGML authors who have 
never heard of SGML and who are absolutely resistant to even keeping 
_formatting_ at the top of their document, much less content information.

SGML entities are part of HTML as an SGML application. They have been since 
HTML 2.0 at least. Nobody has implemented support for them. Meanwhile 4 
or 5 different object embedding tags have come into existance. INSERT is 
a consolidation of them. It does not preclude or even duplicate the 
behaviour of SGML text entities. It does preclude non-SGML entities, but 
could be extended to support them, I suppose.

 Paul Prescod