W3C home > Mailing lists > Public > public-xml-er@w3.org > February 2012

RE: Draft - Fixup or Full XML Parser

From: David Lee <David.Lee@marklogic.com>
Date: Wed, 22 Feb 2012 05:31:33 -0800
To: David Carlisle <davidc@nag.co.uk>
CC: "public-xml-er@w3.org" <public-xml-er@w3.org>
Message-ID: <EB42045A1F00224E93B82E949EC6675E16ADC5FCD6@EXCHG-BE.marklogic.com>
I'm not suggesting implemters cannot choose to make an full XML parser including API's and trees out of this.
I am suggesting that the spec not *require* it.


-----------------------------------------------------------------------------
David Lee
Lead Engineer
MarkLogic Corporation
dlee@marklogic.com
Phone: +1 650-287-2531
Cell:  +1 812-630-7622
www.marklogic.com

This e-mail and any accompanying attachments are confidential. The information is intended solely for the use of the individual to whom it is addressed. Any review, disclosure, copying, distribution, or use of this e-mail communication by others is strictly prohibited. If you are not the intended recipient, please notify us immediately by returning this message to the sender and delete all copies. Thank you for your cooperation.


-----Original Message-----
From: David Carlisle [mailto:davidc@nag.co.uk] 
Sent: Wednesday, February 22, 2012 8:30 AM
To: David Lee
Cc: public-xml-er@w3.org
Subject: Re: Draft - Fixup or Full XML Parser

On 21/02/2012 15:58, David Lee wrote:
> That way as an implementer I could*choose*  to write a (presumably 
> much simpler)

I have some sympathy with the notion that xml-er could output an XML document rather than a parse tree (although I don't think the browser implementers would think much of such a spec:-) but currently I'm not sure that it can really be simpler or avoid most of the complications of parsing.

Given a typical XML like

<!DOCTYPE foo SYSTEM "bad.dtd">
<foo>
&a;
</foo>

and a bad.dtd that could be

either

<!ENTITY % b "<x>">
<!ENTITY % c "a">
<!ENTITY %c; "hmm">

or

<!ENTITY % b "<x>">
<!ENTITY % c "<x>">
<!ENTITY %c; "hmm">

Then you have to go through something that looks very much like the full complication of an XML parse before you can decide that using the first DTD makes the document well formed and using the second makes it not well formed requiring fixup. Having gone that far I'm not convinced that it doesn't make sense to do as Anne's draft does and just output the tree you made rather than serialising that tree back to be re-parsed by XML.



> it could avoid doing some things that XML parsers MUST do like 
> external entity inclusion.

Well that's an open issue, XML parsers do not have to fetch external entities (and ones in browsers other than IE do not fetch them) so it is presumably still to be decided if XML-ER systems must, must not, or may fetch external entities. (If they don't fetch external entities my example above doesn't apply, but you could do something in an internal subset but it wouldn't look quite so weird as you can't use PE's there)

David

________________________________________________________________________
The Numerical Algorithms Group Ltd is a company registered in England and Wales with company number 1249803. The registered office is:
Wilkinson House, Jordan Hill Road, Oxford OX2 8DR, United Kingdom.

This e-mail has been scanned for all viruses by Star. The service is powered by MessageLabs. 
________________________________________________________________________
Received on Wednesday, 22 February 2012 13:31:59 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 22 February 2012 13:32:00 GMT