- From: David Brownell <db@Eng.Sun.COM>
- Date: Mon, 25 Jan 1999 11:06:17 -0800
- To: xml-editor@w3.org
- CC: jjc@jclark.com
1 - There are places that PE processing must be disabled, which are not
identified in the XML specification. James Clark suggested to me
that these places are: "PIs, comments, SystemLiterals and PubidLiterals
in the DTD" which seems appropriate. Without such a clarification:
* The XML spec can't be validated since the <!-- ... %foo; -->
comments WFC violations when they can't be expanded.
* Public IDs (at least in the external subset) can only with
much awkwardness hold the '%' character -- since it'd normally
flag a PE ref, a public ID like "-//fooCorp//DTD 80% done//EN"
would be reported as a fatal violation of PE reference syntax.
Probably the best fix to the spec is to modify the description for
each of those entities to say that PE expansion is not done within
those constructs, and modify the text in 2.8 (right before the first
VC) to comment that some productions preclude internal PE expansion
beyond that in the grammar.
2 - There is ambiguity with respect to treatment of PE ref syntax
within the internal DTD subset in the context of an attribute
or entity value. Section 4.4 doesn't cover this case, since
(in particular) the text for the "PE, Occurs as Attribute Value"
only talks about the "Outside the DTD" case.
Meanwhile, back at the "PEs in Internal Subset" WFC, it seems
this case is quite explicitly covered, and not according to the
way that might be implied by the "Not Recognized" label in 4.4
(whose description clearly does not cover the "within DTD" case).
It says those references are not allowed. (vs "not recognized").
To be concrete: I think it's most natural to report both these
cases as fatal errors, rather than ignore either one. (The
first might be valid in an external parameter entity, though
the second would still be a fatal error.)
<!DOCTYPE root [ <!ATTLIST root foo "%pe;" > ]><root/>
<!DOCTYPE root [ <!ATTLIST root foo "%bad-pe" > ]><root/>
(Imagine an element decl for 'root' if that makes you happy; the
fatal error is not a "Element Valid" validity error which a user
chose to treat as fatal.)
3 - The messy one ... having both a VC and WFC for "Entity Declared".
I hope it's not controversial that the text there is a bit opaque!
While I read Tim's "Annotated XML Spec" it didn't answer my issues.
As a starting point, consider the following simplified language
as the basis for textual improvements (separate statements for
the "EntityRef" and "PEReference" case would be most clear):
The [parameter or general] entity name in the [parameter/general]
entity reference must match that in a [parameter/parsed general]
entity declaration which was previously processed."
Clearly that has none of the qualifiers that complicate the text
now found in the spec ... but I'll propose that they all be removed,
and moreover that only the WFC exist. (The more I look at the way
this is all specified, the more confusing it gets -- often a sign
of a need for some powerful simplification!)
(a) Consider this example:
<!DOCTYPE root [ <!ELEMENT root EMPTY> %undeclared-pe; ]>
<root/>
One reading of the VC and WFC noted above is that neither one of
them applies (so many qualifiers!) and that such a document is
well formed and valid. I don't think that should be; I think
that such a document should clearly not be well formed.
If it's intended that this violate either the VC or WFC, rewriting
is needed to make this quite clear! Bulleted lists are used
elsewhere in the spec for such complex cases, and would help here
(surely one of the most complex sets of qualifiers in this spec).
(b) One "gotcha" in the simplification above is that the current
WFC says that PEs must be declared before use ... even though the
notation to the side of the construct implies that the WFC does
not apply to PE references. That seems like a copy/paste bug,
in that it appears to turn all refs to undeclared PEs into
WFC issues despite the existence of the VC. (That is, undeclared
PEs become fatal WF-ness errors!!)
Related, the VC applies to the general entity reference syntax,
and the qualifications to the WFC make the VC apply in the common
case of an entity declared in an external PE ... refs to undeclared
parsed general entities become (recoverable) VC errors!! (Unless
an interaction with the standalone declaration kicks in; see next.)
Those results are counterintuitive, but are supported by the spec.
(c) Another "gotcha" in that simplification is interaction with
the "Standalone Document Declaration" VC. In the case of a
document which is invalid because it's declared as standalone,
yet which still refers to an externally declared entity, the
qualifications in the WFC say this should be upgraded to become
a violation of this WFC.
I think that the intent of the standalone decl was to facilitate
safe processing of documents when ignoring external PEs, but the
same case (undeclared externally defined entity) is discussed
variously as a WFC error, or a violation of either of two VCs.
Simpler would be to strike the clause of the "Standalone Document
Declaration" VC that applies to entity references, perhaps noting
that a WFC applies, and add a clause to the simplified "Entity
Declared" WFC text above, something like:
In the case of nonvalidating processors which do not read
external parameter entities (the "external DTD subset"), and
which are processing documents not marked "standalone='yes'",
this WFC applies only to entity references preceding the first
external PE that is not processed.
(Of course that explicitly acknowledges that there are at least
two subcategories of nonvalidating parser, based on whether they
read external PEs or not. That's evident, and I think it'd be
good to call it out in the conformance section as well.)
That's just a few highlights of what, for me, is a notable problem
area in the specification. As noted above, I think the best way
through this (and related) issues is to just make entity declaration
always be a WFC issue ... leaving nonvalidating parsers as they now
stand, not guaranteed to report all such WF errors, although clearly
stating when that may happen in a conformant manner.
Apologies in advance if parts of this are as unclear to you as those
parts of the spec are to me; this note was written to sumarize various
issue's we've queued up, and may not capture the discussion (leading
in particular to the "only have the WFC" suggestion) perfectly.
- Dave
Received on Monday, 25 January 1999 14:11:17 UTC