Re: Error handling in XML
In message <firstname.lastname@example.org> Martin Bryan writes:
> At 10:41 20/4/97 GMT, Peter Murray-Rust wrote:
> There is a difference between loss of markup and loss of data. Whilst both
> consititute information, no data should be discarded just because there is
> an error in a piece of markup. XML should at very least retain the following
> data as part of the last validly opened element.
Although I'm not an SGML expert, I take a different view, in that markup and
data are both essential parts of the document. I am prepared to write the
<!DOCTYPE CML-LIKE [
<!ELEMENT CML-LIKE ANY>
<!ELEMENT MASS #PCDATA>
UNITS CDATA "KILOGRAM">
The mass of the reactant was
which was clearly unsafe...
'grams' is as much a part of the document as '3'. If (and I'm not saying
it's recommmended) UNITS defaults to the appropriate SI unit then an omitted
UNITS attribute will be automatically interpreted as kg. The above document
is WF. Without the quotes round 'grams' it is broken. It's quite conceivable
that a parser would simply say 'corrupt attribute omitted'. Then the clever
application (which is used to working with DTD-less documents inserts the
'kg' string. The reader must at least see a little flag saying 'broken' since
it's as broken as if the 3 were replaced by 3000.
There has been a presumption in some of the discussion that it's up to the
authors and readers not to do foolish things in XML. The problem is that
*if you don't have experience in SGML* it's incredibly easy to do foolish things
unless prevented. Most people see the FPI on HTML documents and think it's
a ritual (they're right - it normally is). But in XML that string
_matters_. It can change your document.
I understand the points of view that are being put for leniency in processing
documents. However, if we are selling XML on the basis that it can control
rockets, we must appear to show that we care about precision. No-one can
stop the rest of the world working with broken documents, but I think we have
to promote the value of XML as _supporting_ precision. In my mind:
"HTML is great because you can send people broken documents"
"XML is great because you can't send people broken documents"
> Martin Bryan, The SGML Centre, Churchdown, Glos. GL3 2PU, UK
> Phone/Fax: +44 1452 714029 WWW home page: http://www.sgml.u-net.com/
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences