- From: Joe English <jenglish@crl.com>
- Date: Sat, 19 Oct 1996 09:51:31 -0700
- To: W3C SGML Working Group <w3c-sgml-wg@w3.org>
David G. Durand <dgd@cs.bu.edu> wrote: > At 11:46 10/18/96, Joe English wrote: > >(The ambiguity restriction does not matter here: unambiguous content models > >are a strict subset of regular expressions; any algorithm for matching > >against general REs will also work for unambigous ones. In fact, > >the ambiguity rule can make things *easier* for implementors.) > > Yeah, once the check is over. It's doing the check that is unpleasant and > non-standard. You don't have to perform the check though, even in a validating parser: "4.329 validating SGML parser: A conforming SGML parser that can find and report a reportable markup error if (and only if) one exists. 4.267 reportable markup error: A failure of a document to conform to this International Standard [...] other than a semantic error [...] or: a) an ambiguous content model b) [...]" [ 9.3, "Conforming Systems", p. 215] In other words, 8879 (for better or worse) places the burden of ensuring non-ambiguity on the DTD designer, not the parser. Also note that checking for ambiguity is straightforward -- as long as there are no '&' groups. > >> and & is easy when you just > >> parse against the parse tree (which is what people will do). > > > >I don't see that '&' is _easy_, but as long as we keep the > >ambiguity restriction it's at least tractable. > > You just keep a flag as to whether the & group is used up yet or not. > Gross, but servicable and easy.... That only works if the content model is unambiguous: consider '( (a,(b|c)) & (a,(b|d)) )' after seeing 'ab...' > [earlier] > That means the whole ball of wax, to me. If I had to implement SGML's > ambiguity I'd implement and ambiguity check and match against the parse > tree for the model. If I'm parsing that way, what's a single bit of > additional state per moel token? As I say, I'm not emotional about kereping > &, just don't see why not. Since '&' groups are the only feature that makes the ambiguity restriction difficult to test for, and it's also the only feature (other than OMITTAG [1]) that makes the restriction desirable from the implementor's point of view, the logical conclusion would be to drop '&' groups. (Not that I am advocating that position -- I think XML should keep both '&' groups and the ambiguity restriction, but should not require validating parsers to check the latter.) [1] Final note: OMITTAG, in particular start-tag omission, *cannot work* unless content models are unambiguous and deterministic. This is because of the way "contextually require element" is defined in the standard. --Joe English jenglish@crl.com
Received on Saturday, 19 October 1996 12:50:54 UTC