- From: Joe English <jenglish@crl.com>
- Date: Sat, 19 Oct 1996 09:51:31 -0700
- To: W3C SGML Working Group <w3c-sgml-wg@w3.org>
David G. Durand <dgd@cs.bu.edu> wrote:
> At 11:46 10/18/96, Joe English wrote:
> >(The ambiguity restriction does not matter here: unambiguous content models
> >are a strict subset of regular expressions; any algorithm for matching
> >against general REs will also work for unambigous ones. In fact,
> >the ambiguity rule can make things *easier* for implementors.)
>
> Yeah, once the check is over. It's doing the check that is unpleasant and
> non-standard.
You don't have to perform the check though, even in
a validating parser:
"4.329 validating SGML parser: A conforming SGML parser
that can find and report a reportable markup error if
(and only if) one exists.
4.267 reportable markup error: A failure of a document
to conform to this International Standard [...] other
than a semantic error [...] or:
a) an ambiguous content model
b) [...]"
[ 9.3, "Conforming Systems", p. 215]
In other words, 8879 (for better or worse) places the
burden of ensuring non-ambiguity on the DTD designer,
not the parser.
Also note that checking for ambiguity is straightforward --
as long as there are no '&' groups.
> >> and & is easy when you just
> >> parse against the parse tree (which is what people will do).
> >
> >I don't see that '&' is _easy_, but as long as we keep the
> >ambiguity restriction it's at least tractable.
>
> You just keep a flag as to whether the & group is used up yet or not.
> Gross, but servicable and easy....
That only works if the content model is unambiguous:
consider '( (a,(b|c)) & (a,(b|d)) )' after seeing 'ab...'
> [earlier]
> That means the whole ball of wax, to me. If I had to implement SGML's
> ambiguity I'd implement and ambiguity check and match against the parse
> tree for the model. If I'm parsing that way, what's a single bit of
> additional state per moel token? As I say, I'm not emotional about kereping
> &, just don't see why not.
Since '&' groups are the only feature that makes the ambiguity
restriction difficult to test for, and it's also the only feature
(other than OMITTAG [1]) that makes the restriction desirable from
the implementor's point of view, the logical conclusion would
be to drop '&' groups. (Not that I am advocating that position --
I think XML should keep both '&' groups and the ambiguity restriction,
but should not require validating parsers to check the latter.)
[1] Final note: OMITTAG, in particular start-tag omission,
*cannot work* unless content models are unambiguous and deterministic.
This is because of the way "contextually require element" is defined in
the standard.
--Joe English
jenglish@crl.com
Received on Saturday, 19 October 1996 12:50:54 UTC