- From: Grosso, Paul <pgrosso@ptc.com>
- Date: Thu, 3 Nov 2011 16:49:20 -0400
- To: "Daniel van Vugt" <vanvugt@gmail.com>, <xml-editor@w3.org>
Daniel, Thank you for your interest in the XML spec and your comments [1,2,3] on the XML 1.0 5th edition. The XML Core Working Group discussed them and came to the following conclusion: Regarding the several ambiguous grammar reports ----------------------------------------------- You are correct that the productions as written do not themselves specify a non-ambiguous grammar, and the alterations you are suggesting are exactly the kind that a parser writer should be making if a non-ambiguous grammar is needed or desired. However, the technical ambiguities in the productions in the XML specification have been there since the first edition in 1998, and it was never the intention to imply that the productions in the document can be used without change as a non-ambiguous grammar. The original authors of the specification felt that logical clarity was better served by the productions as written, and parser writers are free to translate them into an equivalent non-ambiguous grammar. Perhaps that sentiment should have been spelled out explicitly in the document, but it does not seem necessary or prudent to do that or to alter the productions at this late date. Regarding the CharData construct -------------------------------- CharData does not include character references. The discussion in section 2.4 starts with "_Text_ consists of intermingled character data and markup." The discussion in the next few paragraphs about character references is talking about character references in _Text_. The CharData term that, as you note, does not allow the < or & character, is only referenced from production [43] for "content" which is the production for _text_, and that production defines "content" as being CharData interspersed with various markup constructs including Reference (which includes entity and character references). Paul Grosso, co-chair of the XML Core WG [1] http://lists.w3.org/Archives/Public/xml-editor/2011OctDec/0000 [2] http://lists.w3.org/Archives/Public/xml-editor/2011OctDec/0001 [3] http://lists.w3.org/Archives/Public/xml-editor/2011OctDec/0002 > -----Original Message----- > From: xml-editor-request@w3.org [mailto:xml-editor-request@w3.org] On > Behalf Of Daniel van Vugt > Sent: Thursday, 2011 October 20 0:20 > To: xml-editor@w3.org > Subject: Errata in section 2.4 of Extensible Markup Language (XML) 1.0 > (Fifth Edition) > > ERROR #1: Ambiguous grammar > > These rules make the grammar ambiguous: > > [14] CharData ::= [^<&]* - ([^<&]* ']]>' [^<&]*) > [43] content ::= CharData? ((element | Reference | CDSect | PI | > Comment) CharData?)* > > CharData is allowed to match an empty string due to its use of "*". > However CharData is referenced as CharData? meaning this potentially > empty string is optional. Therefore, if content is blank, it is > ambiguous as to whether CharData is matched as the empty string or if > CharData is omitted completely. > > Functionally this is low severity. However grammar parsers such as my > own will find both interpretations and treat it as an error because the > grammar is ambiguous. > > The fix is simple. Change: > [14] CharData ::= [^<&]* - ([^<&]* ']]>' [^<&]*) > to: > [14] CharData ::= [^<&]+ - ([^<&]* ']]>' [^<&]*) > > > ERROR #2: CharData supports, and doesn't support, character references > > Section 2.4 seems to suggest that Character Data may contain character > references such as &. However at the same time, the grammar rule > [14] for CharData does not appear to be able to match ampersand > character references at all: > > [14] CharData ::= [^<&]* - ([^<&]* ']]>' [^<&]*) > > > Regards, > > Daniel van Vugt >
Received on Thursday, 3 November 2011 22:22:40 UTC