W3C home > Mailing lists > Public > www-xml-xinclude-comments@w3.org > January 2002

RE: Implementation experience: Unparsed entity and notation information items with XInclude, DOM, SAX, and JDOM

From: Jonathan Marsh <jmarsh@microsoft.com>
Date: Wed, 9 Jan 2002 11:08:47 -0800
Message-ID: <330564469BFEC046B84E591EB3D4D59C049AC23B@red-msg-08.redmond.corp.microsoft.com>
To: "Elliotte Rusty Harold" <elharo@metalab.unc.edu>
Cc: <www-xml-xinclude-comments@w3.org>
The intent of XInclude is that the XInclude implementation be allowed to
operate on the available subset of the Infoset - whatever is provided by
a particular parser.  Information items and properties outside this
subset may be ignored.  I believe that neither DOM nor JDOM provide
sufficient support for external entities and notations to include them
in the infoset subset they support.  The same can be said for a
streaming SAX processor, or possibly any streaming processor.

Since XInclude is defined over the Infoset, we are stuck (or blessed?)
with grey areas like this where the abstraction of the Infoset meets the
reality of various APIs.  We don't see anything we should do about it in
XInclude though.

Jonathan Marsh

> -----Original Message-----
> From: Elliotte Rusty Harold [mailto:elharo@metalab.unc.edu]
> Sent: Sunday, September 02, 2001 7:57 AM
> To: www-xml-xinclude-comments@w3.org
> Cc: www-dom@w3.org
> Subject: Implementation experience: Unparsed entity and notation
> information items with XInclude, DOM, SAX, and JDOM
> As you know, I've written partial XInclude processors in DOM, JDOM,
> SAX. All three of these APIs have major problems handling unparsed
> and notation information items that come in from included, parsed
> documents.
> JDOM does not expose unparsed entities or notations at all. The
> information is simply not available.
> DOM2 informs you of the notations and unparsed entities declared in a
> given document.  However, it provides no means for associating them
> particular attributes, processing instructions or elements. That is,
> does not tell you the type of any attribute. DOM3 may allow you to
> this together, but only with the assistance of the abstract schemas
> module. The DOM3 core still does not include attribute type
> (I should probably suggest adding this to the DOM3 group too.)
> SAX is the only major API that does include enough information to
> theoretically tell which attributes refer to notations and unparsed
> entities. However, SAX is designed as a streaming API. Including
> and unparsed entities requires modifying the DOCTYPE declaration
> DTD which would normally be done at the start of processing. However,
> complete list of notations and unparsed entities referenced is not
> available until the end of the document.
> I noticed this first in the context of notations and unparsed entities
> the document information item. However, it really applies type
> item.
> I am not sure how to handle this. I would like to at least consider
> possibility of not requiring implementations to maintain notation and
> unparsed entities. This may in fact be what section 5.3 is trying to
> i.e. I can ignore anything that's not in 5.3. However, that's not
> clear. For one thing, taking that interpretation would further imply
> the input infoset could ignore character information items, though the
> output set could not.
> If this is what 5.3 is trying to say, then I think some discussion
> earlier in the spec of how an input XML document is converted to an
> infoset is called for. In particular, an explicit statement that other
> kinds of info items can be dropped out would be helpful.
> --
> +-----------------------+------------------------+-------------------+
> | Elliotte Rusty Harold | elharo@metalab.unc.edu | Writer/Programmer |
> +-----------------------+------------------------+-------------------+
> |          The XML Bible, 2nd Edition (Hungry Minds, 2001)           |
> |              http://www.ibiblio.org/xml/books/bible2/              |
> |   http://www.amazon.com/exec/obidos/ISBN=0764547607/cafeaulaitA/   |
> +----------------------------------+---------------------------------+
> |  Read Cafe au Lait for Java News:  http://www.cafeaulait.org/      |
> |  Read Cafe con Leche for XML News: http://www.ibiblio.org/xml/     |
> +----------------------------------+---------------------------------+
Received on Wednesday, 9 January 2002 14:09:19 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 23:09:31 UTC