Re: Draft - Fixup or Full XML Parser

On 21/02/2012 15:58, David Lee wrote:
> That way as an implementer I could*choose*  to write a (presumably
> much simpler)

I have some sympathy with the notion that xml-er could output an XML
document rather than a parse tree (although I don't think the browser
implementers would think much of such a spec:-) but currently I'm not
sure that it can really be simpler or avoid most of the complications of
parsing.

Given a typical XML like

<!DOCTYPE foo SYSTEM "bad.dtd">
<foo>
&a;
</foo>

and a bad.dtd that could be

either

<!ENTITY % b "<x>">
<!ENTITY % c "a">
<!ENTITY %c; "hmm">

or

<!ENTITY % b "<x>">
<!ENTITY % c "<x>">
<!ENTITY %c; "hmm">

Then you have to go through something that looks very much like the full
complication of an XML parse before you can decide that using the first
DTD makes the document well formed and using the second makes it not
well formed requiring fixup. Having gone that far I'm not convinced that
it doesn't make sense to do as Anne's draft does and just output the
tree you made rather than serialising that tree back to be re-parsed by XML.



> it could avoid doing some things that XML parsers MUST do like
> external entity inclusion.

Well that's an open issue, XML parsers do not have to fetch external
entities (and ones in browsers other than IE do not fetch them) so it is
presumably still to be decided if XML-ER systems must, must not, or may
fetch external entities. (If they don't fetch external entities my
example above doesn't apply, but you could do something in an internal
subset but it wouldn't look quite so weird as you can't use PE's there)

David

________________________________________________________________________
The Numerical Algorithms Group Ltd is a company registered in England
and Wales with company number 1249803. The registered office is:
Wilkinson House, Jordan Hill Road, Oxford OX2 8DR, United Kingdom.

This e-mail has been scanned for all viruses by Star. The service is
powered by MessageLabs. 
________________________________________________________________________

Received on Wednesday, 22 February 2012 13:30:31 UTC