Errata in section 2.4 of Extensible Markup Language (XML) 1.0 (Fifth Edition)

ERROR #1: Ambiguous grammar

These rules make the grammar ambiguous:

[14] CharData ::= [^<&]* - ([^<&]* ']]>' [^<&]*)
[43] content ::= CharData? ((element | Reference | CDSect | PI | 
Comment) CharData?)*

CharData is allowed to match an empty string due to its use of "*". 
However CharData is referenced as CharData? meaning this potentially 
empty string is optional. Therefore, if content is blank, it is 
ambiguous as to whether CharData is matched as the empty string or if 
CharData is omitted completely.

Functionally this is low severity. However grammar parsers such as my 
own will find both interpretations and treat it as an error because the 
grammar is ambiguous.

The fix is simple. Change:
[14] CharData ::= [^<&]* - ([^<&]* ']]>' [^<&]*)
to:
[14] CharData ::= [^<&]+ - ([^<&]* ']]>' [^<&]*)


ERROR #2: CharData supports, and doesn't support, character references

Section 2.4 seems to suggest that Character Data may contain character 
references such as &amp;. However at the same time, the grammar rule 
[14] for CharData does not appear to be able to match ampersand 
character references at all:

[14] CharData ::= [^<&]* - ([^<&]* ']]>' [^<&]*)


Regards,

Daniel van Vugt

Received on Thursday, 20 October 2011 09:05:03 UTC