Re: Processing instructions for style tweaks?

Murray Maloney (murray@sco.com)
Thu, 1 Dec 1994 16:32:15 -0500 (EST)


Subject: Re: Processing instructions for style tweaks?
To: Paul Grosso <pbg@texcel.no>
Date: Thu, 1 Dec 1994 16:32:15 -0500 (EST)
From: Murray Maloney <murray@sco.com>
Cc: jjc@jclark.com, murray@sco.com, connolly@hal.com, www-html@www0.cern.ch,
In-Reply-To: <9411301646.AA20227@texcel.no.texcel.no> from "Paul Grosso" at Nov 30, 94 04:46:25 pm
Message-Id:  <9412011632.aa23849@dali.scocan.sco.COM>

> 
> > Subject: Re: Processing instructions for style tweaks?
> > Date: Wed, 30 Nov 1994 10:39:36 -0500 (EST)
> > From: Murray Maloney <murray@sco.com> 
> > 
> > I am dead set against PIs.  Sure we could develop conventions,
> > but they could never be verified as conforming by an SGML parser.
> > No, PIs are bad!  PIs are worse even than format-specific
> > SGML elements like <I> and <B> which can readily be mapped
> > to any formatting desired at the reader's end.
> > 
> > . . .
> > 
> 
> 
> I don't want to come out as if I'm championing PIs.  I believe in
> "clean SGML" [Sharon Adler used to talk of "polluting" the SGML
> with format information] as much as anyone.
> 
> But, as Murray elegantly pointed out in the rest of his post (that
> I elided), we must allow for other people with other viewpoints.
> In particular, there are (at least sometimes for some people) good
> reasons for wanting more control over style that can be achieved
> via, say, DSSSL Lite location/query mechanisms.

Thanks for the compliment.

I was going to let this go, but the more that I thought
about it, the more felt that I had to pursue it.
I don't mean to denigrate Dan's idea, or suggest
that Paul is wrong for supporting it.  However,
I have to argue against PIs as our solution to
the often expressed need to have local control
over formatting.

So, please forgive me for what I am about to say.
I really think that it needs to be said.

> 
> However, I do disagree with "PIs are worse even than format-specific
> SGML elements."  I think you're wrong, here, Murray.  Having formatting
> markup *indistinguishable* from structural markup (i.e., having it all
> be DTD elements--some with "good semantics" and some with "bad semantics")
> is the worst way to go.  

Perhaps I spoke too strongly here. But I remain convinced that
PIs are not a happy solution and one that we would all regret 
in the end.  Read on...
> 
> The advantage of using PIs for formatting-specific markup is that it's
> easy to strip/ignore them when one wants to slough off the "pollution"
> of embedded format-specific information.

But that is also true of attributes.  The advantage of attributes
is that they can be parsed and verified as conforming by 
an SGML parser and by an application, but the application
can choose to ignore them.

In the HTML 2.0 spec, we have been very careful to only make 
formatting suggestions for HTML elements, by using the 
wording "typical rendering" as opposed to specifying
the rendering with a hard and fast rule.

> 
> For example, a PI might be used to force a page break or twiddle a line
> break for certain esthetic reasons during final production (this
> example may be more relevant to hardcopy, high-quality composition),
> but as soon as the publication has gone to press and its time to
> database the information for reuse or subsequent revision, you want to
> strip such markup that is not part of the base information per se but
> only an artifact of a particular presentation situation that is now a
> thing of the past.  If I had a <newpage> element in there instead of a
> <?DL newpage> processing instruction, I would need to have a more
> sophisticated filter--that I would need to change with every new
> format-specific element I added--to strip them all.

Perhaps I have been misunderstood -- ya, that must be it -- so
please allow me to explain my position.

I am not in favor of a proliferation of formatting tags.
I am in favor of using attributes to associate formatting
with an HTML/SGML element.  While I am quite content to
leave the <BR> and <HR> tags alone, I am not proposing
a <SPACE size=20pts> element.  Neither am I writing
in support of a variety of elements proposed by Netscape
that could have been handled with entities.

So, what I was hoping is that we could define a few sets
of attributes (attribute architectural forms) that could 
be attached to elements to provide for formatting control
at the element level.  I imagine that there might be several
classes of elements (I haven't though this all the way through)
including INLINES, HEADINGS, and BLOCKS.

INLINES would have attributes that could affect the presentation
in terms of typeface, type size, and perhaps other characteristics
like kerning, character spacing, word spacing, reverse, etc.
I am not advocating anything specifically (except typeface and size),
but rather suggesting some potential characteristics that could be
adjusted by an author and possibly respected by a browser.

BLOCKS (paragraphs, address blocks, etc) would have attributes 
that could affect the presentation in terms of line filling,
hyphenation, justification, line length, left/right/centre 
adjustment of lines, line spacing, etc.  Again, I am not 
advocating anything, only offering potential candidates.
(Possibly, the attributes associated with INLINES would also
be available to BLOCKS.)

HEADINGS would have attributes similar to BLOCKS, but might 
have other attributes.

And so on.
> 
> With PIs, I can just strip everything of the form <?DL...>, or if my
> software handles it, just say "write -nopi" and get a depolluted
> version of the SGML.  And, if I send the SGML--PIs and all--to another
> conforming SGML system that hasn't been programmed to do anything
> special with <?DL...> PIs, 'no harm, no foul,' it just works and the
> PIs are ignored.

Right.  And with attributes you can simply ignore them.
Or you can ignore them selectively according to the 
user's wishes -- as specified via a dialog.  No harm, no foul.
The big difference is that you don't have to use a special
filter or paser to ignore attributes, and you do have a 
syntax that is verifiable by an SGML parser.
> 
> Finally, using formatting elements doesn't solve many of the problems
> because they either can't be used everywhere one might want, or their
> content models have to be so lax as to destroy the structure of the
> original DTD.  PIs don't have to drastically change the ESIS tree of
> the document.

Here is where I may have been misunderstood -- and it is my own
fault for saying that I prefer formatting elements over PIs.

I am not in favor of a proliferation of formatting tags.
I am in favor of using attributes to associate formatting
with an HTML/SGML element.

Having said that, I am still more willing to accept some
tags that are intended strictly for formatting than PIs.
As examples, I point you to <I> and <B>.  Yes, I have heard
all of the arguments.  But I fail to see how <STRONG> is 
intrinsically better than <B>.  Perhaps that is because 
I do not necesarily believe that something that is coded 
as a <B> needs to be represented in a bold typeface.
My position is that the "typical rendering" of <B> is bold,
but <B> is simply a container.  Given two phrases coded as
<B> and <STRONG>, I defy anyone to tell me that they are 
at once able to describe the semantics of one and not the other.

> 
> I do think there are better and worse ways of using PIs to implement
> the kind of format-override control that's being discussed.  My earlier
> posting described in more detail how I would use PIs to allow for
> instance-specific location mechanisms whose specific formatting effects
> would still be specified in the style sheet.

Finally, I have some practical issues with PIs.

-- We cannot define a DTD for PIs in an HTML document.
   So, we'll have the same mess we had with HTML before
   Dan started his effort to write a spec.  Nobody will
   know for sure and there won't be any way to verify
   it except for the "Mosaic test".  Heaven forfend!

-- Placing PIs before and/or after elements will force 
   applications to save formatting instructions until
   the next element is encountered or to look ahead
   in case there are formatting instructions coming.

   I am not a browser implementor, but I don't think 
   that that is clean.  In fact, I think that it is 
   unnecesarily complex and will discourage browser
   developers from implementing this functionality.

-- Forcing authors of HTML documents to learn another
   language syntax to include their formatting hints
   will discourage them from doing so.

All in all, the way that I would read this if I were a cynic is:

	OK, they agreed that author's formatting hints 
	was a feature that the WWW community was demanding,
	so they set out to design something that nobody
	would want to use in the hope that demand would
	taper off and the greater wisdom of pure SGML and
	DSSSL style sheets would win the day.

Fortunately, I am not a cynic.  But judging from the articles
that I see posted in the comp.infosystems.www.* newsgroups,
there are plenty of them out there waiting.

> 
> paul
> 
> 
> Paul Grosso
> VP Research                      Chief Technical Officer
> ArborText, Inc.                  SGML Open
> 
> Email: paul@arbortext.com
>   or   pbg@texcel.no

===========================================================================
---------------------------------------------------------------------------
Murray C. Maloney			Internet:  murray@sco.com
Technical Publications Writer/Architect	Uucp:	   ...uunet!sco!murray
SCO Canada, Inc.			My Phone:  (416) 960-4031
130 Bloor Street West, 10th Floor	Fax:	   (416) 922-2704
Toronto, Ontario, Canada  M5S 1N5	SCO Phone: (416) 922-1937
===========================================================================
---------------------------------------------------------------------------
Sponsor member of Davenport Group (ftp://ftp.ora.com/pub/davenport/)
Member of IETF HTML Working Group (http://www.hal.com/%7Econnolly/html-spec/)
Member of SGML Open Internet and WWW Technical Committee
===========================================================================