Re: Error handling in XML

In this discussion many contributors have seen XML as a natural extension
of the way the SGML community works at present and I appreciate the 
concerns that they have.  It seems to be clear that a large number of
humans using SGML at present would be unhappy to be required to create 
WF documents - I accept this as a current fact.

I take the optimistic view that XML will become ubiquitous and therefore
most creation of documents will not be created by _humans in the current
SGML tradition_.  In developing Chemical Markup Language I see five 
sources (and sinks) of XML documents.  They are not all relevant to other
disciplines, but I expect there are further possibilities I have missed.
Assuming that CML becomes widely used, I will attempt to put percentages on 
them (correct to a few orders of magnitiude :-)

Instruments (50%);
Most data are now collected on instruments with digital outputs.
There is no technical reason why these cannot be CMLised.  (The current
output formats are often awful.  If you worry about legacy text data, legacy
numeric data (especially binary or FORTRAN) is far worse).

Calculations(5%);
Many scientists don't do lab work, they compute what reality ought to be.
This is good for those selling large numbercrunchers.  It would be a low
cost option to adapt most of the programs to output CML.

Databases(40%);
A large amount of data is transferred to and from databases.  (Most of it 
comes from instruments, of course).

Scientifc publications (journals) (1%)
The primary place where results are published, though this could change.
Most publishers know that SGML exists and many use it.  Normally the markup
is very large-grained.

Humans(1%);
They normally write the publications, but there are some disciplines 
(such as crystallography) where a short fully marked-up journal article is 
output by a machine.

Apart from the humans, none of the others has any incentive not to produce
WF-documents - why should they?  We agree that bandwidth is not a problem
(article of faith).  All these outputs are produced by a program of some
sort which can count enough to balance quotes and tags.  I'd hate to
think that we were giving a signal to authors of these programs that they
should do anything other than produce WF XML.  (>>99% of them wouldn't
dream that anything else was potentially acceptable - they are used to
having to work to given rules).

In 1993 I imagine most HTML was hacked by humans.  In 1997 I suspect only
a very small amount is (apart from mine :-). In 2 years at most that will 
certainly be true for XML.  I think we should try to look at 1999 and ask
what people will really want _then_, rather than assuming it's driven
by what we can see now.

	P.

-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

Received on Thursday, 24 April 1997 07:17:25 UTC