Re: error recovery

On Sun, 2012-02-19 at 12:56 -0500, Noah Mendelsohn wrote:
[...]
> In principle, the only thing an XML parser should say about this is that 
> it's not well formed, because the <a> does not match the </b>. In practice, 
> SAX streaming parsers regularly fudge on this, cheerfully reporting the <x> 
> elements before discovering at the end that it was all a mistake (and we 
> really don't know if those <x>'s were good data, or whether the document 
> was in fact mangled earlier.

The only requirement XML places is that the processor not report a
document as well-formed XML when it isn't.  It's this requirement I want
to see taken account of, and not forgotten, of course.  I'd like the Web
browser to be able to tell the user, "there was an "a" element on line
96 with no end tag, and an end-tag for a "b" element on line 4015 with
no start tag" or something like that.

The SAX parsers that report an error at the </b> not matching the <a>
are not I think out of spec.

SGML parsers would typically have returned an </a> inserted before the
</b> by the parser, closing all elements up the stack and then
complaining there was more input after the end of the document,
behaviour most users found confusing!

> I presume our goal here is for XML-ER to be not much worse than XML in its 
> streaming characteristics.

Actually the worst case I've encountered in XML is
<a b:att1="v1" b:att2="v2" ... [a gigabyte of attributes followed by]
     b:attFFFF="vFFFF" xmlns:b="http://example.org/" />

You may have to buffer all the attributes until you get to the namespace
declaration. In practice this isn't really an issue for a Web browser,
or for anything else constructing a tree, because you have to keep them
anyway.

But I think we're maybe wandering a bit.

Liam

-- 
Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/
Pictures from old books: http://fromoldbooks.org/

Received on Sunday, 19 February 2012 18:15:51 UTC