Re: Notes on the process from Matthew Fuchs on 1997-05-09 (w3c-sgml-wg@w3.org from May 1997)

From: Matthew Fuchs <matt@wdi.disney.com>
Date: Fri, 9 May 97 11:29:46 PDT
To: w3c-sgml-wg@w3.org
Message-Id: <199705091829.LAA11843@scrumpox.rd.wdi.disney.com>
Jon Bosak says:

>   The processor is required to behave in the manner we've described only
>   as long as it's calling itself an XML processor.  This is an
>   advertising issue.  In the scenario Bill describes, the processor sees
>   the broken message, and this being what he calls a "mission critical"
>   application, it has been programmed to respond to such a message by
>   saying to itself, "This is broken, so it must not be an XML message,
>   and therefore I'm free to stop being an XML processor and to do what I
>   need to do, which is to recover the message the best I can and get it
>   to the receiver in the best shape that I can manage."  So the parser
>   has to take off its figurative "XML approved" hat for a minute to save
>   your life.  Big deal.

I think it may be worthwhile to extend the "XML approved" hat, otherwise
you're saying to the XML vendors "As soon as something goes wrong you're
free to behave in any slovenly manner you want."  This would make it
hard to write an app that would work on both MS and NS browsers unless
the Net never fails.

There is an important distinction between ill-formed documents (which
we want to discourage), on the one hand, and garbled or fragmentary
documents (which we want to help) on the other.  Bill Smith's concerns
are certainly with the latter.  I also wonder if "push" technology and
the results of the Document Object Model group won't make the notion
of a full document more and more tenuous.

Error recovery is always enabled by embedding redundant information.
For example, I can create a WF doc which can recover from lost tags by
sending tags with a format as follows:
<tagname-tagid-depth-startpos>...</tagname-tagid-depth-endpos>, where
tagid is incremented with each new tag, depth is depth in the tree,
startpos and endpos are the current positions in the document.

Of course, this document would have the unusual characteristic of
being well formed but invalid (unless the DTD is far larger than the
document), so the processor would need to understand and strip this
out.  The processor (and possibly the app) would also know what is
missing.   (I may not have chosen the best format for embedding this
information, but the point is that it can be done).

On the Draconian side, it is pretty obvious that it takes less extra
info to recover a doc that started WF.  I would agree that only WF
docs should be _transmitted_, which I get the impression is what the
vendors really want.  On the Tolerant side, this shows there are
models which will allow applications to do error recovery without
opening the door to tag soup.

Finally, I think this supports Henry Thompson's "Radical
Simplification" suggestion.  We can build in decent error correction
if we handle it separate from the language itself.


Matthew Fuchs
matt@wdi.disney.com
http://cs.nyu.edu/phd_students/fuchs
Received on Friday, 9 May 1997 14:28:03 UTC