Re: Error handling in XML

As I read through this excellent thread, a chicken-and-egg question arises:

"Which came first, the information or the document?"

If it's the latter, then I question the need for rigorous WFness. If an author
(or authoring application) simply wants to deliver a document to human
consumers on the Web, then the application probably ought to make nice and
forgive errors in the author's futzy XML code. The document's author may have
good reasons for wanting XML's richness vs. HTML's fixed tag set.

But if it's the former, and the author needs to insure the information's
integrity so that it can do something more, such as post electronic salary
payments into your bank account or transfer medical information from your
family physician into a hospital prior to your surgery, then the application
needs to insist on WFness.

M and N anticipated that HTML authors would not want their documents held up
merely because of HTML errors, and if it's reasonable to suspect that M and N
may grant similar pardon to XML documents, shouldn't XML also anticipate this?

Shouldn't the author, the generator of the document, have a say in whether the
XML application should be kind or rigorous? Maybe this is crazy, but why not
let the author state an intention:

<?XML VERSION="1.0" WF="YES"?>

  to indicate that the information in this XML document must be rigorously
  parsed for WFness


<?XML VERSION="1.0" WF="NO"?>

  to indicate that the application may do its best in the event of
  WFness errors, perhaps using a model such as Satwinder's?

Moreover, XML applications might also indicate whether they accept information
that the author intends for rigorous or loose XML parsing, as in a table:

                       APP: WF="YES"          APP: WF="NO"

Author: WF="YES        Rigorous parse         Rigorous parse

Author: WF="NO"        Rigorous parse         Loose parse

That is, if either the author or the application requires a rigorous XML
parse, then so be it.

If XML anticipates this and gives information providers, document creators,
and application developers the choice, then no one should be terribly
surprised when erroneous XML code is accepted or declined.

Is this a naive idea? If XML is meant to serve a large community of users,
it's probably sensible to anticipate different requirements. Document delivery
is one thing, and information processing is another. The word "document"
straddles this boundary and creates a great deal of confusion, as people who
support new users of SGML systems can attest.

If XML can anticipate these different requirements with a "do the right thing"
or "do your own thing" branch, doesn't this solve a big problem? And isn't
that problem (from HTML) a presumption that authors didn't care about the
WFness or validity of their information?


Todd Freter
Program Manager, Information Products
Solaris Products Group, SunSoft
Sun Microsystems, Inc.

Follow-Ups: References: