W3C home > Mailing lists > Public > w3c-sgml-wg@w3.org > December 1996

Re: Trying to sum up a bit

From: Murray Altheim <murray@spyglass.com>
Date: Tue, 17 Dec 1996 17:24:15 -0400
Message-Id: <v02140b09aedcb269fdac@[208.203.149.72]>
To: Tim Bray <tbray@textuality.com>
Cc: w3c-sgml-wg@w3.org
Tim Bray <tbray@textuality.com> writes:
[...]
>3. All non-markup bytes are signicant, whitespace or not (Durand)
>
>Pro: Everyone can understand the rules, it's easy to implement
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>Con: You lose certain Hytime addressing facilities, and the application
>     gets no help from the XML processor in ignoring WS that to the user
>     is "obviously" irrelevant.
>
>4. Use an mechanism *in the instance* to signal a DTD-less application
>   what's going on.
>
> 4.1 The PI-based DTD summary (Sperberg-McQueen)
> 4.2 Explicit quoting of significant character data (Goldfarb)
> 4.3 -XML-SPACE
[...]
>Face facts, folk.  There is just not a solution that is going to solve
>this problem and be free of some cost.  And please remember the cost
>of explanation and education is very real.  For myself, my preference
>would be (in descending order) #3, #4.3.3 (a3/b2), 4.3.2, 4.3.1.

I've on several occasions written up responses to this thread, only to be
pre-answered and re-confused by further discussions/questions. When you
talk of the cost of explanation and education, you must also mention the
cost of messy solutions (from an end-user perspective).

I would hope we would avoid messy solutions AT ALL COSTS. The public view
of XML will have a lot to do with how many messy solutions we create in the
specification. Every instance of things like

  <?XML-SPACE-PROGRESSIVE-SHIFT VARIANCE::8879-97//LPN SPIN-RIGHT="XML"/>

as a learning requirement on document authors is going to hurt the
perception of XML as a solid specification, free of hacks, and relatively
simple to learn and use. I don't even like the opening PI much, to be
frank. If option #3 only causes problems with 'certain HyTime addressing
facilities', then it seems that someone should concentrate on coming up
with a solution for HyTime, not XML. I wouldn't assume too many users
coming up from HTML or learning XML as an 'onramp to SGML' are going to be
impacted much by difficulties with HyTime addressing facilities.

If Jon is willing to admit to ignorance, I'll surely admit to plenty 'o
ignorance. I don't even know what a HyTime addressing facility is, and I
would hope I wouldn't need to know one from a Semantic Specific Result
Instance, a Conceptual Output Instance, a Generic Language Translation
Process Specification, a Formatting Output Specification Instance, or any
other string longer than thirty characters. You won't find XML on any
shelves if so. If someone needs this level of complexity, then, as we
mention in the XML Q&A sheet, then don't use XML, use full-bore SGML.

Based on my abhorrence of custom strings and excessive complexity, isn't
there some way (if the cons of #3 are accurately stated) that we simply use
some simple rules:

>3. All non-markup bytes are signicant, whitespace or not (Durand)

This is true only for the 'pGrove' (ala Kimber below)? In processing the
pGrove, we make some assumptions about authors' intentions. Further
processing is based on these assumptions:

   a. whitespace within an element is significant to that element*
   b. whitespace between elements is not significant
   c. whitespace after a start tag is eliminated (ie., not significant)
   d. whitespace preceding the end tag is normalized to a single space

*Associated with item (a) is the fact that sans DTD, no document author can
expect all whitespace to be significant in content, eg., one couldn't
expect an XML UA to know about <PRE> in HTXML without either a DTD or a
stylesheet to provide that information, preferably a stylesheet. Then,
maybe we could require

   SDAFORM  CDATA  #FIXED "Lit"

on all element content with significant whitespace and be done with it. :-)

I must reinforce Jon's assertion that when discussing child nodes of a
parse tree, most of us ignorant folks aren't going to be thinking of a
linefeed as the third element of an ancestor.

I'm with you Tim on #3 as a first choice. The other solutions seem to
clutter up the requirements on what is intended to be a simple spec.

Murray

```````````````````````````````````````````````````````````````````````````````
    Murray Altheim, Program Manager
    Spyglass, Inc., Cambridge, Massachusetts
    email: <mailto:murray@spyglass.com>
    http:  <http://www.cm.spyglass.com/murray/murray.html>
           "Give a monkey the tools and he'll eventually build a typewriter."
Received on Tuesday, 17 December 1996 17:21:23 EST

This archive was generated by hypermail pre-2.1.9 : Wednesday, 24 September 2003 10:03:49 EDT