Re: Recent ERB votes from Paul Grosso on 1996-11-07 (w3c-sgml-wg@w3.org from November 1996)

From: Paul Grosso <paul@arbortext.com>
Date: Thu, 7 Nov 96 16:26:54 CST
To: W3C-SGML-WG@w3.org
Message-Id: <9611072226.AA09169@atiaus.arbortext.com>
> From: bosak@atlantic-83.Eng.Sun.COM (Jon Bosak)
> 
> [Paul Grosso:]
> 
> | > From: bosak@atlantic-83.Eng.Sun.COM (Jon Bosak)
> | > 
> | > Note also that this strategy does not discriminate against the
> | > existing SGML document base.  There are probably as many existing SGML
> | > documents that will work unchanged in an XML environment as there are
> | > HTML documents.  My Shakespeare and Religious Works collections are
> | > valid XML just as they stand
> | 
> | How is that?  How are empty elements represented in your existing SGML
> | document base for your Shakespeare and Religious Works collections?  
> 
> There are none.
> 
> | Irrespective of the answer to the above, what SGML authoring tools 
> | would produce valid XML given document instances in your Shakespeare 
> | and Religious Works collections?
> 
> The SGML authoring tools that I actually did use: emacs and perl.  The
> fact that these are probably not examples of what you have in mind
> when you say "SGML authoring tools" says something important about the
> preconceptions you may have about XML.

You're wrong.  The answer you give is exactly what I expected.

> 
> | Assuming neither Author/Editor nor Adept Editor (whose output, I
> | believe, are very similar) are answers to the above question, what
> | would be the list of modifications necessary to convert what they would
> | produce given document instances in your Shakespeare and Religious
> | Works collections into valid XML?
> 
> I think that Lee has addressed this.  However, the question misses the
> point.  The primary use of XML is to convey structured information
> from SGML databases to Web applications.  The batch processes that I
> used to generate the Shakespeare and Religion collections are actually
> much closer in spirit to the problem domain that XML is primarily
> designed to address than anything having to do with native authoring.

Again, your answer is not a surprise to me, but I think it is useful
to see it in print.  I have sensed a lot of people trying to give
lip service to the claim that XML is a subset of SGML--at least in
spirit if not in fact--and it's refreshing to have you admit that
this is not your intent.  It's not your intention that the base of 
existing SGML authoring tools will handle XML. 

> The point I was trying to make (but complicated by using the word
> "valid") is that if you strip off the current doctype header from the
> Shakepeare and Religion files, you have well-formed XML that could (if
> such things existed) be fed directly to XML browsers.  The fact that I
> was writing well-formed XML in 1992 says to me that XML is "natural
> SGML" -- it is what I intuitively thought that SGML was when I
> started.  And consequently I believe that a lot of existing "monastic
> SGML" can be made into XML much more easily than the great majority of
> existing HTML documents.

"Natural SGML" may well be some subset that omits a lot of esoterica.
However, XML as being defined now will require new tools and will force 
new decisions upon potential users that will cause extra confusion as 
they try to figure out what to do with their data.  Whether this ends 
up being good for the users is still an open question. 

> 
> I will repeat the point that I want to make sure doesn't get lost
> here: the XML spec does not favor HTML legacy data over SGML legacy
> data; quite the contrary.
> 
> Jon

I'm not sure a comparison to HTML is of as much interest for me as the
comparison between XML and SGML legacy, both in terms of legacy
instances as well as legacy DTDs, information modeling, and processes.

There is a huge and important customer base that has a whole lot
invested in "legacy" SGML.  I suspect that only a very small portion of
the total investment is, for example, based on the various minimization
techniques of 8879, so I'm happy to see them tossed.  But I suspect,
for example, about 99% of the investment is built on assumptions that 
there are EMPTY elements.

I'm not afraid from a tool producer point of view.  Let's face it,
for any organization that's developed an SGML aware authoring tool,
it's not going to be hard to add an "XML-mode" switch to it.  What
I'm a bit more concerned about is the confusion among potential
adopters we'll be creating just at the time I thought SGML was finally
getting over the "it's too new to try out yet" hump.  People who
just jumped on the SGML bandwagon will now all say "shoot, I jumped
too soon, now it looks like XML is the latest thing--geez I knew it
was still too early to commit to SGML."  This WG has been concentrating
on the technical issues, but it's the effect the technical definition
of XML has on the market place reaction that has me concerned.

Insofar as XML is just "a subset of SGML without the cruft" then I
don't have as much of the marketing problem I describe above.  Maybe 
if all the vendors hop to it and add "XML-mode" and "Convert to
XML/convert from XML" options in their tools, then we can minimize 
the effect of the market confusion.  

Meanwhile, it'll be interesting to see how all this plays in Boston
when it gets more into the hands of community at large.  
 
paul
Received on Thursday, 7 November 1996 17:31:46 UTC