Re: C.9 Forbid & connector? from Joe English on 1996-10-19 (w3c-sgml-wg@w3.org from October 1996)

From: Joe English <jenglish@crl.com>
Date: Sat, 19 Oct 1996 09:51:31 -0700
To: W3C SGML Working Group <w3c-sgml-wg@w3.org>
Message-Id: <199610191651.AA18323@mail.crl.com>

David G. Durand <dgd@cs.bu.edu> wrote:
> At 11:46 10/18/96, Joe English wrote:

> >(The ambiguity restriction does not matter here: unambiguous content models
> >are a strict subset of regular expressions; any algorithm for matching
> >against general REs will also work for unambigous ones.  In fact,
> >the ambiguity rule can make things *easier* for implementors.)
>
> Yeah, once the check is over. It's doing the check that is unpleasant and
> non-standard.

You don't have to perform the check though, even in
a validating parser:

    "4.329 validating SGML parser: A conforming SGML parser
    that can find and report a reportable markup error if
    (and only if) one exists.

    4.267 reportable markup error:  A failure of a document
    to conform to this International Standard [...] other
    than a semantic error [...] or:
	a) an ambiguous content model
	b) [...]"

    [ 9.3, "Conforming Systems", p. 215]

In other words, 8879 (for better or worse) places the
burden of ensuring non-ambiguity on the DTD designer,
not the parser.

Also note that checking for ambiguity is straightforward --
as long as there are no '&' groups.

> >> and & is easy when you just
> >> parse against the parse tree (which is what people will do).
> >
> >I don't see that '&' is _easy_, but as long as we keep the
> >ambiguity restriction it's at least tractable.
>
> You just keep a flag as to whether the & group is used up yet or not.
> Gross, but servicable and easy....

That only works if the content model is unambiguous:
consider  '( (a,(b|c)) & (a,(b|d)) )' after seeing 'ab...'

> [earlier]
> That means the whole ball of wax, to me. If I had to implement SGML's
> ambiguity I'd implement and ambiguity check and match against the parse
> tree for the model. If I'm parsing that way, what's a single bit of
> additional state per moel token? As I say, I'm not emotional about kereping
> &, just don't see why not.

Since '&' groups are the only feature that makes the ambiguity
restriction difficult to test for, and it's also the only feature
(other than OMITTAG [1]) that makes the restriction desirable from
the implementor's point of view, the logical conclusion would
be to drop '&' groups.  (Not that I am advocating that position --
I think XML should keep both '&' groups and the ambiguity restriction,
but should not require validating parsers to check the latter.)

[1] Final note: OMITTAG, in particular start-tag omission,
*cannot work* unless content models are unambiguous and deterministic.
This is because of the way "contextually require element" is defined in
the standard.

--Joe English

  jenglish@crl.com

Received on Saturday, 19 October 1996 12:50:54 UTC