- From: Magnus Danielson <cfmd@swipnet.se>
- Date: Wed, 1 Jan 2003 00:33:29 -0500 (EST)
- To: xml-editor@w3.org
Hi! Reading up on the XML spec I found these three rules: [75] ExternalID ::= 'SYSTEM' S SystemLiteral | 'PUBLIC' S PubidLiteral S SystemLiteral [82] NotationDecl ::= '<!NOTATION' S Name S (ExternalID | PublicID) S? '>' [83] PublicID ::= 'PUBLIC' S PubidLiteral There is a parser conflict for LL(k)-parsers (for low values of k) at least in weither to go into the ExternalID or PublicID when the first token is 'PUBLIC'. The compount of ExternalID and PublicID forms a better construct: NotationDecl ::= '<!NOTATION' S Name S ExternalOdPublicID S? '>' ExternalOrPublicID ::= ExternalId | PublicID by including ExternalId and PublicID into this new production, you get ExternalOrPublicID ::= 'SYSTEM' S SystemLiteral | 'PUBLIC' S PubidLiteral S SystemLiteral | 'PUBLIC' S PubidLiteral we can then further rewrite this by observing that 'PUBLIC' S PubidLiteral is common to the two last variants and thus making S SystemLiteral optional. This results in ExternalOrPublicID ::= 'SYSTEM' S SystemLiteral | 'PUBLIC' S PubidLiteral (S SystemLiteral)? Leaving the alternative grammar to be [75] ExternalID ::= 'SYSTEM' S SystemLiteral | 'PUBLIC' S PubidLiteral S SystemLiteral [82] NotationDecl ::= '<!NOTATION' S Name S ExternalOrPublicID S? '>' [83] ExternalOrPublicID ::= 'SYSTEM' S SystemLiteral | 'PUBLIC' S PubidLiteral (S SystemLiteral)? I.e. production [75] untouched, production [82] modified to use a compound production and production [83] replaced with the compound production. Note that the PublicID production is only used in the NotationDecl production. It might be clearer to amend the grammar accordingly. Also, this email could act as advice for anyone experience this trouble and wanting to avoid further problems. I think the proposed change could also be of use for LALR grammars. It could alternatively be discussed if not the constructed compound and the ExternalID should be the same, i.e. of not the SystemLiteral should be present at all times. The archives didn't really give me much of a clue. It should be clear that a few other constructs require rewrites to lower the lookahead actually needed for LL-parsers, and the children production is probably the most confusing one. Maybe should the placement of s inside and outside of the occurence constructs in chilren and cp be different to help reduce the confusion and need of lookahead. Cheers, Magnus
Received on Thursday, 2 January 2003 00:38:03 UTC