- From: Kent M Pitman <kmp@harlequin.com>
- Date: Fri, 17 Apr 98 05:13:03 EDT
- To: xml-editor@w3.org
- Cc: kmp@harlequin.com
XML, following in SGML's footsteps, seems to me to overcomplicate the parse phase, by trying to force errors to be detected in the parse phase when really they ought to parse more simply and be detected as errors if necessary later. An example is: [24] VersionInfo ::= S 'version' Eq ( "'" VersionNum "'" | '"' VersionNum '"' ) [26] VersionNum ::= ( [a-zA-Z0-9_.:] | '-' )+ Is there a really good reason this has to have a special case rule for parsing each and every string? Really, the only thing the parser needs should be: [24] VersionInfo ::= S 'version' Eq datastring datastring ::= ( "'" [^'] "'" ) | ( '"' [^"] '"' ) and then everything else that needs data can use the same thing. e.g., [32] SDDecl ::= S 'standalone' Eq datastring It should be a validity constraint that the value is either 'yes' or 'no' in [32]--it should not affect the parsing. The problem might be that you need a way of talking about what's in the quotes, not the thing including the quotes, but that can be solved by introduction of new terminology and/or syntax. e.g., you could invent a notation such that you would write: [24] VersionInfo ::= S 'version' Eq @VersionNum where @VersionNum meant that the parser would parse a quoted datum and the spec refer to the quoted object as VersionNum. Or you could define a way of indicating the parsing as giving a name to the datastring with the quotes, and then some way of saying that the data content has a name. [24] VersionInfo ::= S 'version' Eq VersionNumStr VersionNumStr ::= datastring VersionNum = the data content of VersionNumStr Right now, if you're not using YACC and you're instead hand-parsing this stuff, you end up with separate parsers for each of these things that I think oughtn't be separate. (Actually, YACC probably has separate parsers too, but doesn't tell you.) Anyway, I just think there's no good excuse for not doing how poeple think of it, which is "here's a thing that takes a string datum as an argument" and "oh, by the way, once we figure out what the string is, we can tell you if it's the right string". I don't think users expect a "that's not well-formed syntax" error for x="maybe"; they expect a "that's not a good value for that attribute" error--and you can't say that unless you can parse it in the first place. In other worse, I claim improper values ARE well-formed; just not valid. But the present syntax doesn't permit that view; the present syntax actively forces a more complex view. I think the present situation quite unfortunate because it disallows some very intuitive (and more modular) parser implementations, requiring them to be gratuitously larger and more convoluted, involving more special cases that might break between versions. ----------- DISCLAIMER: The above are my personal feelings and not necessarily Harlequin's official position.
Received on Friday, 17 April 1998 05:09:45 UTC