- From: David Brownell <db@Eng.Sun.COM>
- Date: Mon, 25 Jan 1999 11:06:17 -0800
- To: xml-editor@w3.org
- CC: jjc@jclark.com
1 - There are places that PE processing must be disabled, which are not identified in the XML specification. James Clark suggested to me that these places are: "PIs, comments, SystemLiterals and PubidLiterals in the DTD" which seems appropriate. Without such a clarification: * The XML spec can't be validated since the <!-- ... %foo; --> comments WFC violations when they can't be expanded. * Public IDs (at least in the external subset) can only with much awkwardness hold the '%' character -- since it'd normally flag a PE ref, a public ID like "-//fooCorp//DTD 80% done//EN" would be reported as a fatal violation of PE reference syntax. Probably the best fix to the spec is to modify the description for each of those entities to say that PE expansion is not done within those constructs, and modify the text in 2.8 (right before the first VC) to comment that some productions preclude internal PE expansion beyond that in the grammar. 2 - There is ambiguity with respect to treatment of PE ref syntax within the internal DTD subset in the context of an attribute or entity value. Section 4.4 doesn't cover this case, since (in particular) the text for the "PE, Occurs as Attribute Value" only talks about the "Outside the DTD" case. Meanwhile, back at the "PEs in Internal Subset" WFC, it seems this case is quite explicitly covered, and not according to the way that might be implied by the "Not Recognized" label in 4.4 (whose description clearly does not cover the "within DTD" case). It says those references are not allowed. (vs "not recognized"). To be concrete: I think it's most natural to report both these cases as fatal errors, rather than ignore either one. (The first might be valid in an external parameter entity, though the second would still be a fatal error.) <!DOCTYPE root [ <!ATTLIST root foo "%pe;" > ]><root/> <!DOCTYPE root [ <!ATTLIST root foo "%bad-pe" > ]><root/> (Imagine an element decl for 'root' if that makes you happy; the fatal error is not a "Element Valid" validity error which a user chose to treat as fatal.) 3 - The messy one ... having both a VC and WFC for "Entity Declared". I hope it's not controversial that the text there is a bit opaque! While I read Tim's "Annotated XML Spec" it didn't answer my issues. As a starting point, consider the following simplified language as the basis for textual improvements (separate statements for the "EntityRef" and "PEReference" case would be most clear): The [parameter or general] entity name in the [parameter/general] entity reference must match that in a [parameter/parsed general] entity declaration which was previously processed." Clearly that has none of the qualifiers that complicate the text now found in the spec ... but I'll propose that they all be removed, and moreover that only the WFC exist. (The more I look at the way this is all specified, the more confusing it gets -- often a sign of a need for some powerful simplification!) (a) Consider this example: <!DOCTYPE root [ <!ELEMENT root EMPTY> %undeclared-pe; ]> <root/> One reading of the VC and WFC noted above is that neither one of them applies (so many qualifiers!) and that such a document is well formed and valid. I don't think that should be; I think that such a document should clearly not be well formed. If it's intended that this violate either the VC or WFC, rewriting is needed to make this quite clear! Bulleted lists are used elsewhere in the spec for such complex cases, and would help here (surely one of the most complex sets of qualifiers in this spec). (b) One "gotcha" in the simplification above is that the current WFC says that PEs must be declared before use ... even though the notation to the side of the construct implies that the WFC does not apply to PE references. That seems like a copy/paste bug, in that it appears to turn all refs to undeclared PEs into WFC issues despite the existence of the VC. (That is, undeclared PEs become fatal WF-ness errors!!) Related, the VC applies to the general entity reference syntax, and the qualifications to the WFC make the VC apply in the common case of an entity declared in an external PE ... refs to undeclared parsed general entities become (recoverable) VC errors!! (Unless an interaction with the standalone declaration kicks in; see next.) Those results are counterintuitive, but are supported by the spec. (c) Another "gotcha" in that simplification is interaction with the "Standalone Document Declaration" VC. In the case of a document which is invalid because it's declared as standalone, yet which still refers to an externally declared entity, the qualifications in the WFC say this should be upgraded to become a violation of this WFC. I think that the intent of the standalone decl was to facilitate safe processing of documents when ignoring external PEs, but the same case (undeclared externally defined entity) is discussed variously as a WFC error, or a violation of either of two VCs. Simpler would be to strike the clause of the "Standalone Document Declaration" VC that applies to entity references, perhaps noting that a WFC applies, and add a clause to the simplified "Entity Declared" WFC text above, something like: In the case of nonvalidating processors which do not read external parameter entities (the "external DTD subset"), and which are processing documents not marked "standalone='yes'", this WFC applies only to entity references preceding the first external PE that is not processed. (Of course that explicitly acknowledges that there are at least two subcategories of nonvalidating parser, based on whether they read external PEs or not. That's evident, and I think it'd be good to call it out in the conformance section as well.) That's just a few highlights of what, for me, is a notable problem area in the specification. As noted above, I think the best way through this (and related) issues is to just make entity declaration always be a WFC issue ... leaving nonvalidating parsers as they now stand, not guaranteed to report all such WF errors, although clearly stating when that may happen in a conformant manner. Apologies in advance if parts of this are as unclear to you as those parts of the spec are to me; this note was written to sumarize various issue's we've queued up, and may not capture the discussion (leading in particular to the "only have the WFC" suggestion) perfectly. - Dave
Received on Monday, 25 January 1999 14:11:17 UTC