- From: Jonathan Marsh <jmarsh@microsoft.com>
- Date: Wed, 9 Jan 2002 11:08:47 -0800
- To: "Elliotte Rusty Harold" <elharo@metalab.unc.edu>
- Cc: <www-xml-xinclude-comments@w3.org>
The intent of XInclude is that the XInclude implementation be allowed to operate on the available subset of the Infoset - whatever is provided by a particular parser. Information items and properties outside this subset may be ignored. I believe that neither DOM nor JDOM provide sufficient support for external entities and notations to include them in the infoset subset they support. The same can be said for a streaming SAX processor, or possibly any streaming processor. Since XInclude is defined over the Infoset, we are stuck (or blessed?) with grey areas like this where the abstraction of the Infoset meets the reality of various APIs. We don't see anything we should do about it in XInclude though. Thanks, Jonathan Marsh > -----Original Message----- > From: Elliotte Rusty Harold [mailto:elharo@metalab.unc.edu] > Sent: Sunday, September 02, 2001 7:57 AM > To: www-xml-xinclude-comments@w3.org > Cc: www-dom@w3.org > Subject: Implementation experience: Unparsed entity and notation > information items with XInclude, DOM, SAX, and JDOM > > As you know, I've written partial XInclude processors in DOM, JDOM, and > SAX. All three of these APIs have major problems handling unparsed entity > and notation information items that come in from included, parsed > documents. > > JDOM does not expose unparsed entities or notations at all. The > information is simply not available. > > DOM2 informs you of the notations and unparsed entities declared in a > given document. However, it provides no means for associating them with > particular attributes, processing instructions or elements. That is, it > does not tell you the type of any attribute. DOM3 may allow you to hack > this together, but only with the assistance of the abstract schemas > module. The DOM3 core still does not include attribute type information. > (I should probably suggest adding this to the DOM3 group too.) > > SAX is the only major API that does include enough information to > theoretically tell which attributes refer to notations and unparsed > entities. However, SAX is designed as a streaming API. Including notations > and unparsed entities requires modifying the DOCTYPE declaration and/or > DTD which would normally be done at the start of processing. However, the > complete list of notations and unparsed entities referenced is not > available until the end of the document. > > I noticed this first in the context of notations and unparsed entities in > the document information item. However, it really applies type information > item. > > I am not sure how to handle this. I would like to at least consider the > possibility of not requiring implementations to maintain notation and > unparsed entities. This may in fact be what section 5.3 is trying to say; > i.e. I can ignore anything that's not in 5.3. However, that's not totally > clear. For one thing, taking that interpretation would further imply that > the input infoset could ignore character information items, though the > output set could not. > > If this is what 5.3 is trying to say, then I think some discussion much > earlier in the spec of how an input XML document is converted to an XML > infoset is called for. In particular, an explicit statement that other > kinds of info items can be dropped out would be helpful. > -- > > +-----------------------+------------------------+-------------------+ > | Elliotte Rusty Harold | elharo@metalab.unc.edu | Writer/Programmer | > +-----------------------+------------------------+-------------------+ > | The XML Bible, 2nd Edition (Hungry Minds, 2001) | > | http://www.ibiblio.org/xml/books/bible2/ | > | http://www.amazon.com/exec/obidos/ISBN=0764547607/cafeaulaitA/ | > +----------------------------------+---------------------------------+ > | Read Cafe au Lait for Java News: http://www.cafeaulait.org/ | > | Read Cafe con Leche for XML News: http://www.ibiblio.org/xml/ | > +----------------------------------+---------------------------------+
Received on Wednesday, 9 January 2002 14:09:19 UTC