Re: XML Conformance Levels [Was: ERB Decisions of March 26th] from Peter Murray-Rust on 1997-03-27 (w3c-sgml-wg@w3.org from March 1997)

From: Peter Murray-Rust <Peter@ursus.demon.co.uk>
Date: Thu, 27 Mar 1997 18:34:36 GMT
To: w3c-sgml-wg@w3.org
Message-Id: <5165@ursus.demon.co.uk>
I suspect that I am in danger using the WG as a learning experience :-), but 
the last two mails [Paul Prescod and Alex Milowski] have clarified things a 
lot.  I accept (partly on faith)
that groves + DSSSL + Java is probably the 'right' way to do what I want to do.
What the WG should take from my postings is the sort of way that a newcomer
may react (and if XML is to be successful then we want a lot of newcomers).

There are (at least) these obvious arguments for different levels in the 
XML spec:
	- a given application doesn't need the whole power.  Thus it is a 
		legitimate part of XML simply to transport and minimally 
		render well-formed documents.
	- the implementors cannot manage to get the whole lot working at the 
		start (or don't feel it's cost effective).
	- the 'users' (authors and readers) need to be educated in stages.

It's the last of these that I am concerned about at present.  I am impressed
by the amount of software that is tracking the likely spec, and I suspect that
some impressive tools will be available.  But it *does* take time to learn
new tricks.  It's taken me at least two years to learn SGML and the power
of structured documents.  I remember hearing that it takes 6 months to learn
C++ if you have a friend and 12 if you haven't. It took me longer than that :-)

As I said I don't want to be unkind to my colleagues, but I think they are
typical of many disciplines that *could* make use of XML.  It's clear that
the advanced SGML community is already capable of leading - as shown by the
members of the WG.  However, my experience further away is:
	- people like HTML; they're not frightened of it.  IMO the biggest
		selling point for XML is that it looks visually like HTML :-)
	- SGML is a much underestimated part of the publishing process.  I
		sit on an e-journal committee and the SGML 'markup' is 
		*done by the printers* (and there are frequent complaints
		that's it's incorrect).  The concept of a structured document
		doesn't really exist.
	- progress is normally slow, in small linked steps.  So, for example,
		the move from FORTRAN to C is still taking place.  People
		'write in C++' but there is no discipline-specific reusable
		software and many are still writing FORTRAN-like programs.
	- document standards are non-existent.  Even when there is published
		documentation many people simply guess from examples, without
		reading the manuals.

I therefore think that for people to realise that they need to learn XML
(the easiest bit), XML-LINK (harder), XML-STYLE (presumably harder), DSSSL,
groves is going to take a good deal of time.  And I'm talking about the 
potential implementors (of which there aren't a lot in my area).   

So my request is that whatever emerges must be assimilatable in small steps,
rather than having to buy everything at once.  

Having said that, you've convinced me that later in the year DSSSL is worth
looking at.  I'm delighted to hear that the objects that I have developed
in Java can be bolted onto it, so I'll bash away at those for the moment :-)

> > Probably.  Stylesheets are seen as publishers' tools or beautifiers for 
> > display.  They are not seen as things which transform documents in a 
> > discipline-relevant manner.  
> 
> Transformation are part of DSSSL as well.  ...but they are a different part of
> the standard.  What does "discipline-relevant" mean?

The key words were 'seen as'.  Our transformations would be things like 
inputting a molecule into a simulation and getting an array of eigenvalues 
out.  This probably can be done under DSSSL, but no one would think of doing 
it that way.  Some legacy systems have several million lines of code.

[....]
> 
> No.  This is a common misconception about DSSSL.  The result of applying a
> DSSSL stylesheet is a flowobject tree (or event stream as in Jade--an 
> equivalent construct).   A flowobject is a "thing" that has properties.  Thus,
> if your application flowobject has a "mouse-up" property, you can specify
> what to do with that property.  

This sounds promising.  I look forward to some examples and maybe progress
will be rapid.

> 
> In DSSSL you need to separate the description of the data of the behavior
> (e.g. what value my "mouse-up" property has) from the use of that
> data (e.g. on a mouse up, take the image specified in the property value and
> display it).  Formatting is just one kind of style semantic.  Browser
> display semantics is another--and this is what you seem to be interested in. 

It's one thing :-).  My main crusade (which I suspect will fail) is to persuade
the community that structured documents are vastly more powerful than what they
are used to.  Browsing is only one aspect.  Structured documents can be searched
even if they have widely varying architectures.  Some examples are:
	- find all the molecules in this publication and calculate their 
		molecular weights.
	- find all the reactions in this month's reports and transmit them
		(in XML) to a robot.  If the starting materials aren't in the
		stores, search the suppliers' catalogs.

The browsing has the purposes:
	- to show people how rich their legacy information is, when structured
	- show how how their systems can be built out of information and 
		software components
	- to create a very gentle introduction to XML (they never have to 
		see any)

	P.

-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/
Received on Thursday, 27 March 1997 13:35:13 UTC