- From: Noah Mendelsohn <nrm@arcanedomain.com>
- Date: Sun, 19 Feb 2012 12:56:17 -0500
- To: David Lee <David.Lee@marklogic.com>
- CC: Norman Walsh <ndw@nwalsh.com>, W3C XML-ER Community Group <public-xml-er@w3.org>
On 2/18/2012 1:08 PM, David Lee wrote: > Question: If this*cant* be done in a streaming processor what does that mean ? > Does it mean the input must be fully read in order to "fix" it ? It typically means that in order to parse later content you may have to either go back and revisit earlier content, possibly quite far back in a large document), or else hold onto large amounts of state retained from earlier in the document as you proceed to parse the rest. Also, it tends to mean that you can report content to a consuming application more or less as you go. Typically, in an XML parser, you need to retain thinks like the stack of open element names, the in-scope prefixes, and entity definitions, but not much else. So, one can argue that in that sense, XML tends to stream pretty well. There are other languages for which a correct parse involves revisiting or retaining a lot more than that, and those languages might be viewed as . Strictly speaking, XML breaks the second criterion for streaming above. Consider the following simple document: <a> <x>some bytes</x <x>some bytes</x ...repeat the <x>'s 1 million times </b> In principle, the only thing an XML parser should say about this is that it's not well formed, because the <a> does not match the </b>. In practice, SAX streaming parsers regularly fudge on this, cheerfully reporting the <x> elements before discovering at the end that it was all a mistake (and we really don't know if those <x>'s were good data, or whether the document was in fact mangled earlier. So, in that sense, XML is not a streaming format anyway. I presume our goal here is for XML-ER to be not much worse than XML in its streaming characteristics. Noah
Received on Sunday, 19 February 2012 17:56:45 UTC