Re: Error handling in XML

In message <1.5.4.32.19970421065409.0069c20c@mail.u-net.com> Martin Bryan writes:
> At 10:41 20/4/97 GMT, Peter Murray-Rust wrote:
[...]
> 
> There is a difference between loss of markup and loss of data. Whilst both
> consititute information, no data should be discarded just because there is
> an error in a piece of markup. XML should at very least retain the following
> data as part of the last validly opened element.

Although I'm not an SGML expert, I take a different view, in that markup and
data are both essential parts of the document.  I am prepared to write the 
following:

<?XML VERSION="1.0"?>
<!DOCTYPE CML-LIKE [
<!ELEMENT CML-LIKE ANY>
<!ELEMENT MASS #PCDATA>
<!ATTLIST MASS
          UNITS CDATA "KILOGRAM">
]>
<CML-LIKE>
The mass of the reactant was
<MASS UNITS="grams">3</MASS>
which was clearly unsafe...
</CML-LIKE>

'grams' is as much a part of the document as '3'.  If (and I'm not saying 
it's recommmended) UNITS defaults to the appropriate SI unit then an omitted
UNITS attribute will be automatically interpreted as kg.  The above document 
is WF.  Without the quotes round 'grams' it is broken.  It's quite conceivable
that a parser would simply say 'corrupt attribute omitted'.  Then the clever
application (which is used to working with DTD-less documents inserts the
'kg' string.  The reader must at least see a little flag saying 'broken' since
it's as broken as if the 3 were replaced by 3000.  

There has been a presumption in some of the discussion that it's up to the 
authors and readers not to do foolish things in XML.  The problem is that 
*if you don't have experience in SGML* it's incredibly easy to do foolish things
unless prevented.  Most people see the FPI on HTML documents and think it's
a ritual (they're right - it normally is).  But in XML that string 
_matters_.  It can change your document.
 
I understand the points of view that are being put for leniency in processing
documents.  However, if we are selling XML on the basis that it can control
rockets, we must appear to show that we care about precision.  No-one can
stop the rest of the world working with broken documents, but I think we have 
to promote the value of XML as _supporting_ precision.  In my mind:
	"HTML is great because you can send people broken documents"
	"XML is great because you can't send people broken documents"

YMMV
	P.



> ----
> Martin Bryan, The SGML Centre, Churchdown, Glos. GL3 2PU, UK 
> Phone/Fax: +44 1452 714029   WWW home page: http://www.sgml.u-net.com/
> 
> 
> 

-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

Received on Monday, 21 April 1997 11:30:52 UTC