Re: XML Conformance Levels [Was: ERB Decisions of March 26th]
I suspect that I am in danger using the WG as a learning experience :-), but
the last two mails [Paul Prescod and Alex Milowski] have clarified things a
lot. I accept (partly on faith)
that groves + DSSSL + Java is probably the 'right' way to do what I want to do.
What the WG should take from my postings is the sort of way that a newcomer
may react (and if XML is to be successful then we want a lot of newcomers).
There are (at least) these obvious arguments for different levels in the
- a given application doesn't need the whole power. Thus it is a
legitimate part of XML simply to transport and minimally
render well-formed documents.
- the implementors cannot manage to get the whole lot working at the
start (or don't feel it's cost effective).
- the 'users' (authors and readers) need to be educated in stages.
It's the last of these that I am concerned about at present. I am impressed
by the amount of software that is tracking the likely spec, and I suspect that
some impressive tools will be available. But it *does* take time to learn
new tricks. It's taken me at least two years to learn SGML and the power
of structured documents. I remember hearing that it takes 6 months to learn
C++ if you have a friend and 12 if you haven't. It took me longer than that :-)
As I said I don't want to be unkind to my colleagues, but I think they are
typical of many disciplines that *could* make use of XML. It's clear that
the advanced SGML community is already capable of leading - as shown by the
members of the WG. However, my experience further away is:
- people like HTML; they're not frightened of it. IMO the biggest
selling point for XML is that it looks visually like HTML :-)
- SGML is a much underestimated part of the publishing process. I
sit on an e-journal committee and the SGML 'markup' is
*done by the printers* (and there are frequent complaints
that's it's incorrect). The concept of a structured document
doesn't really exist.
- progress is normally slow, in small linked steps. So, for example,
the move from FORTRAN to C is still taking place. People
'write in C++' but there is no discipline-specific reusable
software and many are still writing FORTRAN-like programs.
- document standards are non-existent. Even when there is published
documentation many people simply guess from examples, without
reading the manuals.
I therefore think that for people to realise that they need to learn XML
(the easiest bit), XML-LINK (harder), XML-STYLE (presumably harder), DSSSL,
groves is going to take a good deal of time. And I'm talking about the
potential implementors (of which there aren't a lot in my area).
So my request is that whatever emerges must be assimilatable in small steps,
rather than having to buy everything at once.
Having said that, you've convinced me that later in the year DSSSL is worth
looking at. I'm delighted to hear that the objects that I have developed
in Java can be bolted onto it, so I'll bash away at those for the moment :-)
> > Probably. Stylesheets are seen as publishers' tools or beautifiers for
> > display. They are not seen as things which transform documents in a
> > discipline-relevant manner.
> Transformation are part of DSSSL as well. ...but they are a different part of
> the standard. What does "discipline-relevant" mean?
The key words were 'seen as'. Our transformations would be things like
inputting a molecule into a simulation and getting an array of eigenvalues
out. This probably can be done under DSSSL, but no one would think of doing
it that way. Some legacy systems have several million lines of code.
> No. This is a common misconception about DSSSL. The result of applying a
> DSSSL stylesheet is a flowobject tree (or event stream as in Jade--an
> equivalent construct). A flowobject is a "thing" that has properties. Thus,
> if your application flowobject has a "mouse-up" property, you can specify
> what to do with that property.
This sounds promising. I look forward to some examples and maybe progress
will be rapid.
> In DSSSL you need to separate the description of the data of the behavior
> (e.g. what value my "mouse-up" property has) from the use of that
> data (e.g. on a mouse up, take the image specified in the property value and
> display it). Formatting is just one kind of style semantic. Browser
> display semantics is another--and this is what you seem to be interested in.
It's one thing :-). My main crusade (which I suspect will fail) is to persuade
the community that structured documents are vastly more powerful than what they
are used to. Browsing is only one aspect. Structured documents can be searched
even if they have widely varying architectures. Some examples are:
- find all the molecules in this publication and calculate their
- find all the reactions in this month's reports and transmit them
(in XML) to a robot. If the starting materials aren't in the
stores, search the suppliers' catalogs.
The browsing has the purposes:
- to show people how rich their legacy information is, when structured
- show how how their systems can be built out of information and
- to create a very gentle introduction to XML (they never have to
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences