Re: error recovery

On 2/18/2012 1:08 PM, David Lee wrote:
> Question: If this*cant*  be done in a streaming processor what does that mean ?
> Does it mean the input must be fully read in order to "fix" it ?

It typically means that in order to parse later content you may have to 
either go back and revisit earlier content, possibly quite far back in a 
large document), or else hold onto large amounts of state retained from 
earlier in the document as you proceed to parse the rest. Also, it tends to 
mean that you can report content to a consuming application more or less as 
you go. Typically, in an XML parser, you need to retain thinks like the 
stack of open element names, the in-scope prefixes, and entity definitions, 
but not much else. So, one can argue that in that sense, XML tends to 
stream pretty well.

There are other languages for which a correct parse involves revisiting or 
retaining a lot more than that, and those languages might be viewed as .

Strictly speaking, XML breaks the second criterion for streaming above. 
Consider the following simple document:

<a>
   <x>some bytes</x
   <x>some bytes</x
   ...repeat the <x>'s 1 million times
</b>

In principle, the only thing an XML parser should say about this is that 
it's not well formed, because the <a> does not match the </b>. In practice, 
SAX streaming parsers regularly fudge on this, cheerfully reporting the <x> 
elements before discovering at the end that it was all a mistake (and we 
really don't know if those <x>'s were good data, or whether the document 
was in fact mangled earlier. So, in that sense, XML is not a streaming 
format anyway.

I presume our goal here is for XML-ER to be not much worse than XML in its 
streaming characteristics.

Noah

Received on Sunday, 19 February 2012 17:56:45 UTC