Grammar conflict in NotationDecl on PUBLIC

Hi!

Reading up on the XML spec I found these three rules:

[75]    ExternalID	::=    'SYSTEM' S SystemLiteral
			|      'PUBLIC' S PubidLiteral S SystemLiteral 
[82]    NotationDecl    ::=    '<!NOTATION' S Name S (ExternalID | PublicID) S? '>'
[83]    PublicID	::=    'PUBLIC' S PubidLiteral

There is a parser conflict for LL(k)-parsers (for low values of k) at least in
weither to go into the ExternalID or PublicID when the first token is 'PUBLIC'.

The compount of ExternalID and PublicID forms a better construct:

NotationDecl		::=    '<!NOTATION' S Name S ExternalOdPublicID S? '>'

ExternalOrPublicID	::=    ExternalId | PublicID

by including ExternalId and PublicID into this new production, you get

ExternalOrPublicID	::= 'SYSTEM' S SystemLiteral
			 |  'PUBLIC' S PubidLiteral S SystemLiteral
			 |  'PUBLIC' S PubidLiteral

we can then further rewrite this by observing that 'PUBLIC' S PubidLiteral is
common to the two last variants and thus making S SystemLiteral optional. This
results in

ExternalOrPublicID	::= 'SYSTEM' S SystemLiteral
			 |  'PUBLIC' S PubidLiteral (S SystemLiteral)?

Leaving the alternative grammar to be

[75]    ExternalID	::=    'SYSTEM' S SystemLiteral
			|      'PUBLIC' S PubidLiteral S SystemLiteral 
[82]    NotationDecl    ::=    '<!NOTATION' S Name S ExternalOrPublicID S? '>'
[83]    ExternalOrPublicID	::= 'SYSTEM' S SystemLiteral
			 |  'PUBLIC' S PubidLiteral (S SystemLiteral)?

I.e. production [75] untouched, production [82] modified to use a compound
production and production [83] replaced with the compound production.
Note that the PublicID production is only used in the NotationDecl production.

It might be clearer to amend the grammar accordingly. Also, this email could
act as advice for anyone experience this trouble and wanting to avoid further
problems.

I think the proposed change could also be of use for LALR grammars.

It could alternatively be discussed if not the constructed compound and the
ExternalID should be the same, i.e. of not the SystemLiteral should be
present at all times.

The archives didn't really give me much of a clue.

It should be clear that a few other constructs require rewrites to lower the
lookahead actually needed for LL-parsers, and the children production is
probably the most confusing one. Maybe should the placement of s inside and
outside of the occurence constructs in chilren and cp be different to help
reduce the confusion and need of lookahead.

Cheers,
Magnus

Received on Thursday, 2 January 2003 00:38:03 UTC