- From: Ray Whitmer <rayw@netscape.com>
- Date: Tue, 18 Dec 2001 05:23:31 -0800
- To: David Brownell <david-b@pacbell.net>
- CC: Elliotte Rusty Harold <elharo@metalab.unc.edu>, www-dom@w3.org
- Message-ID: <3C1F4353.6060503@netscape.com>
The infoset has an item for representing unexpanded general entity references. Following the lead of infoset, I would believe that even parsers which always fully expand entity references should always insert an empty EntityReference node where undefined entity references occur (assuming that the mode of the parser does not cause the reference to fail). The XHTML specification requires, I believe, that such unresolved references not be silently dropped, but be displayed. But I wish the infoset were clearer on this issue with respect to attribute values. Although the name of the infoset item appears quite general, for representing unexpanded entity references, it says "serves as a placeholder by which an XML processor can indicate that it has not expanded an external parsed entity", with a clear reference to the definition of external entities. External entities are not permitted in attribute values in the first place (see table in 4.3 in XML specification), so you should never have an unexpanded entity reference in the first place in the attribute value under the theory that there is never a reason not to expand internal entities. And if you look at infoset, there is in fact no way to insert one of these unexpanded entity references into an attribute value, because it does not represent attributes as hierarchies the way nodes are represented in the DOM. There are a couple of reasons that I think these assumptions are wrong that you never have unexpanded entity references in attributes: 1. The document may reference a DTD, which a non-validating parser does not wish to process which contains references to entities that, although external to the document, are technically internal entities which may appear in attribute values (or anywhere else). Although such a document is admittedly not a "standalone" XML document, it can easily happen that you process such a document with a parser that does not read external DTDs and produces an infoset. It seems wrong in that case to just drop the unexpanded entity references out of the document. 2. I believe it is high time for a real alternative to DTDs. There are alternative schema representations, and xinclude allows inclusions. The only thing missing is the ability to support entity declarations in some other way. Take specifications such as SOAP, where it is frequently desirable to nest a document fragment, which may use entity references, into a document of a completely different sort. Namespaces allow us to handle the tag naming problems, but if the fragment is, for example, an XHTML fragment or even an XHTML document, there is no way to nest the character entity declarations and any other internal entity declarations the part may rely upon into the proper scope for the embedded fragment or document without disrupting the global scope of the document (which is one reason SOAP has outlawed entity declarations altogether, making it not a true XML application in my judgement). The natural way to define alternative entity resolution would be via an infoset transform, as XInclude has been done. If the infoset permitted these unexpanded entity reference items to represent internal entities as well as external entities, it becomes a trivial exercize to make a specification for elements which make a scoped entity declaration that is then eliminated as part of the transformation substituting replacements for unexpanded entity references. The only requirement is that the unexpanded entity references be in the hierarchy. This would require infoset to adopt a representation of unexpanded entity references in attribute values. I would suggest two array, one of the unexpanded entity references and another telling the offsets where they occur within the attribute value which is represented as a string in the infoset. The current DOM representation would be an adequate representation of that, although convenience indexed accessors could be added to perfectly match the infoset if anyone thought it were important. The only alternative seems to be to be to invent some different syntax and processing for entities and make all older documents incompatible, which I know no one has been willing to do. Anyone join me in making this suggestion formally to the XML WG (the former, not the alternative)? Ray Whitmer rayw@netscape.com David Brownell wrote: >>What should a DOM implementation do when faced with something like this >>when the replacement text for the geenral entity is not available ... ? >> > >The Infoset would say that unexpanded (for any reason) entity refs >ought to have a distinct representation. Some might be OK (not >expanded because defined in a PE that was not read), some would >be fatal errors (all PEs were read, it still wasn't defined) if the XML >REC were more sensible about handling undeclared entities. (It >defines a boatload of exception cases with ambiguous English.) > > >>In particular what should getValue() return for the corresponding >>Attr node? >> > >Considering that it's an error situation of some kind, and that both >the DOM (L2 anyway) and Infoset punted on how such errors are >to be reported, and the XML REC still has issues in specification >of entity handling ... why not try returning a random haiku? :) > >- Dave >
Received on Tuesday, 18 December 2001 08:24:38 UTC