- From: Dieter Köhler <d.k@philo.de>
- Date: Tue, 21 Sep 2004 06:33:17 +0000
- To: xml-editor@w3.org
Prod. [39] "VC: Element Valid" seems to have somewhat surprising implication in respect to the validity of documents using character references expanding to white space. For example the following document, which comes from the XML test suite (ID = 'rmt-e2e-15g') is invalid: <!DOCTYPE foo [ <!ELEMENT foo (foo*)> ]> <foo><foo/> <foo/></foo> while this document seems to be valid: <!DOCTYPE foo [ <!ELEMENT foo (foo*)> <!ENTITY bar"<foo/> <foo/>"> ]> <foo>&bar;</foo> The implications of this VC for implementing an XML parser are quite huge, because it requires character reference expansion to be performed during and not before validation, because if character reference expansion is done in an earlier separate step, the information whether a certain whitespace character was encoded as a character reference or in literal is lost. In contrast, entity reference expansion must take place before validation. However, there is one exemption from this rule. In the case of a tag of type EMPTY, the validation has to take place before entity reference expansion, while in the other cases it must take place after entity reference expansion. This is demonstrated by the following (invalid) test case (ID = 'rmt-e2e-15a'): <!DOCTYPE foo [ <!ELEMENT foo EMPTY> <!ENTITY empty ""> ]> <foo>∅</foo> Validation of <foo> cannot take place after expanding ∅, because <foo></foo> would be valid; while in the previous example <foo>&bar;</foo> must first be expanded in order to be validated. I would like to advocate a change in prod. [39] "VC: Element Valid", requiring validation to take place after all character and entity references have been expanded. The consequences for former invalid documents becoming valid now is very limited: no former valid documents are becoming invalid, and only entities expanding to whitespace or an empty replacement text might be affected, such as the following (currently invalid) document from the XML test suite (ID = 'rmt-e2e-15h'): <!DOCTYPE foo [ <!ELEMENT foo (foo*)> <!ENTITY space "&#32;"> ]> <foo><foo/>&space;<foo/></foo> On the other side the benefits of a changed VC for the design of XML parsers are huge. The whole parsing process may now be split into two clearly distinct stages: 1. wellformedness testing and reference expansion, 2. validation. Of course, this proposed modification would temporarily (until a fix is released) break existing XML processor implementations (although I reckon that not all prominent XML processors in fact implement this VC correctly). On the other side, the modification would bring XML closer to its design goal no. 4: "It shall be easy to write programs which process XML documents." Dieter Köhler
Received on Wednesday, 22 September 2004 15:27:00 UTC