- From: Daniel van Vugt <vanvugt@gmail.com>
- Date: Fri, 04 Nov 2011 11:19:35 +0800
- To: "Grosso, Paul" <pgrosso@ptc.com>, xml-editor@w3.org
I am very surprised you are not accepting corrections to the standard, for mistakes that you acknowledge do exist. Especially a correction such as this which only requires changing a single character. However, this is not the first time I have encountered an official language specification with BNF grammar where the authors have stated they don't guarantee the grammar to be technically accurate... For the benefit of the wider community, I think it would be helpful to still publish the errata, even indefinitely, and even if you have no intention of ever resolving the problems in the main document. - Daniel On 04/11/11 04:49, Grosso, Paul wrote: > Daniel, > > Thank you for your interest in the XML spec and your > comments [1,2,3] on the XML 1.0 5th edition. > > The XML Core Working Group discussed them and came to the > following conclusion: > > Regarding the several ambiguous grammar reports > ----------------------------------------------- > You are correct that the productions as written do not themselves > specify a non-ambiguous grammar, and the alterations you are > suggesting are exactly the kind that a parser writer should > be making if a non-ambiguous grammar is needed or desired. > > However, the technical ambiguities in the productions in the XML > specification have been there since the first edition in 1998, > and it was never the intention to imply that the productions > in the document can be used without change as a non-ambiguous > grammar. The original authors of the specification felt that > logical clarity was better served by the productions as written, > and parser writers are free to translate them into an equivalent > non-ambiguous grammar. > > Perhaps that sentiment should have been spelled out explicitly > in the document, but it does not seem necessary or prudent to > do that or to alter the productions at this late date. > > Regarding the CharData construct > -------------------------------- > CharData does not include character references. > > The discussion in section 2.4 starts with "_Text_ consists of > intermingled character data and markup." The discussion in > the next few paragraphs about character references is talking > about character references in _Text_. The CharData term that, > as you note, does not allow the< or& character, is only > referenced from production [43] for "content" which is the > production for _text_, and that production defines "content" > as being CharData interspersed with various markup constructs > including Reference (which includes entity and character > references). > > > Paul Grosso, co-chair of the XML Core WG > > [1] http://lists.w3.org/Archives/Public/xml-editor/2011OctDec/0000 > [2] http://lists.w3.org/Archives/Public/xml-editor/2011OctDec/0001 > [3] http://lists.w3.org/Archives/Public/xml-editor/2011OctDec/0002 > >> -----Original Message----- >> From: xml-editor-request@w3.org [mailto:xml-editor-request@w3.org] On >> Behalf Of Daniel van Vugt >> Sent: Thursday, 2011 October 20 0:20 >> To: xml-editor@w3.org >> Subject: Errata in section 2.4 of Extensible Markup Language (XML) 1.0 >> (Fifth Edition) >> >> ERROR #1: Ambiguous grammar >> >> These rules make the grammar ambiguous: >> >> [14] CharData ::= [^<&]* - ([^<&]* ']]>' [^<&]*) >> [43] content ::= CharData? ((element | Reference | CDSect | PI | >> Comment) CharData?)* >> >> CharData is allowed to match an empty string due to its use of "*". >> However CharData is referenced as CharData? meaning this potentially >> empty string is optional. Therefore, if content is blank, it is >> ambiguous as to whether CharData is matched as the empty string or if >> CharData is omitted completely. >> >> Functionally this is low severity. However grammar parsers such as my >> own will find both interpretations and treat it as an error because > the >> grammar is ambiguous. >> >> The fix is simple. Change: >> [14] CharData ::= [^<&]* - ([^<&]* ']]>' [^<&]*) >> to: >> [14] CharData ::= [^<&]+ - ([^<&]* ']]>' [^<&]*) >> >> >> ERROR #2: CharData supports, and doesn't support, character references >> >> Section 2.4 seems to suggest that Character Data may contain character >> references such as&. However at the same time, the grammar rule >> [14] for CharData does not appear to be able to match ampersand >> character references at all: >> >> [14] CharData ::= [^<&]* - ([^<&]* ']]>' [^<&]*) >> >> >> Regards, >> >> Daniel van Vugt >> > >
Received on Friday, 4 November 2011 03:21:55 UTC