RE: Implementation experience: Unparsed entity and notation information items with XInclude, DOM, SAX, and JDOM

The intent of XInclude is that the XInclude implementation be allowed to
operate on the available subset of the Infoset - whatever is provided by
a particular parser.  Information items and properties outside this
subset may be ignored.  I believe that neither DOM nor JDOM provide
sufficient support for external entities and notations to include them
in the infoset subset they support.  The same can be said for a
streaming SAX processor, or possibly any streaming processor.

Since XInclude is defined over the Infoset, we are stuck (or blessed?)
with grey areas like this where the abstraction of the Infoset meets the
reality of various APIs.  We don't see anything we should do about it in
XInclude though.

Thanks,
Jonathan Marsh

> -----Original Message-----
> From: Elliotte Rusty Harold [mailto:elharo@metalab.unc.edu]
> Sent: Sunday, September 02, 2001 7:57 AM
> To: www-xml-xinclude-comments@w3.org
> Cc: www-dom@w3.org
> Subject: Implementation experience: Unparsed entity and notation
> information items with XInclude, DOM, SAX, and JDOM
> 
> As you know, I've written partial XInclude processors in DOM, JDOM,
and
> SAX. All three of these APIs have major problems handling unparsed
entity
> and notation information items that come in from included, parsed
> documents.
> 
> JDOM does not expose unparsed entities or notations at all. The
> information is simply not available.
> 
> DOM2 informs you of the notations and unparsed entities declared in a
> given document.  However, it provides no means for associating them
with
> particular attributes, processing instructions or elements. That is,
it
> does not tell you the type of any attribute. DOM3 may allow you to
hack
> this together, but only with the assistance of the abstract schemas
> module. The DOM3 core still does not include attribute type
information.
> (I should probably suggest adding this to the DOM3 group too.)
> 
> SAX is the only major API that does include enough information to
> theoretically tell which attributes refer to notations and unparsed
> entities. However, SAX is designed as a streaming API. Including
notations
> and unparsed entities requires modifying the DOCTYPE declaration
and/or
> DTD which would normally be done at the start of processing. However,
the
> complete list of notations and unparsed entities referenced is not
> available until the end of the document.
> 
> I noticed this first in the context of notations and unparsed entities
in
> the document information item. However, it really applies type
information
> item.
> 
> I am not sure how to handle this. I would like to at least consider
the
> possibility of not requiring implementations to maintain notation and
> unparsed entities. This may in fact be what section 5.3 is trying to
say;
> i.e. I can ignore anything that's not in 5.3. However, that's not
totally
> clear. For one thing, taking that interpretation would further imply
that
> the input infoset could ignore character information items, though the
> output set could not.
> 
> If this is what 5.3 is trying to say, then I think some discussion
much
> earlier in the spec of how an input XML document is converted to an
XML
> infoset is called for. In particular, an explicit statement that other
> kinds of info items can be dropped out would be helpful.
> --
> 
> +-----------------------+------------------------+-------------------+
> | Elliotte Rusty Harold | elharo@metalab.unc.edu | Writer/Programmer |
> +-----------------------+------------------------+-------------------+
> |          The XML Bible, 2nd Edition (Hungry Minds, 2001)           |
> |              http://www.ibiblio.org/xml/books/bible2/              |
> |   http://www.amazon.com/exec/obidos/ISBN=0764547607/cafeaulaitA/   |
> +----------------------------------+---------------------------------+
> |  Read Cafe au Lait for Java News:  http://www.cafeaulait.org/      |
> |  Read Cafe con Leche for XML News: http://www.ibiblio.org/xml/     |
> +----------------------------------+---------------------------------+

Received on Wednesday, 9 January 2002 14:09:19 UTC