W3C home > Mailing lists > Public > w3c-sgml-wg@w3.org > October 1996

Re: C.9 Forbid & connector?

From: Joe English <jenglish@crl.com>
Date: Sun, 20 Oct 1996 05:42:18 -0700
Message-Id: <199610201242.AA03527@mail.crl.com>
To: W3C SGML Working Group <w3c-sgml-wg@w3.org>

David G. Durand <dgd@cs.bu.edu> wrote:

> At 9:51 AM 10/19/96, Joe English wrote:
> I don't think we should emulate the policy of having errors that may break
> some parsers and documents, but are not reportable.

Agreed.  There is another option though: XML could enforce a
stricter -- and easier to implement -- ambiguity restriction
than 8879.  (Removing the restriction altogether would make 
XML incompatible with SGML, so that is not an option as I see it.)

The main difficulty in testing content models with AND
groups for ambiguity comes from (pathological?) cases
like:  ((foo & bar), foo), i.e., AND groups whose first-
and follow- sets are not disjoint.  If XML extends the notion
of "ambiguity" to include these cases as well, then AND groups 
can be treated just like repeatable OR groups for purposes of 
detecting violations of the rule.

(This might be a good idea in any case, based on the primary
rationale behind the ambiguity restriction in the first place:
"human beings must also perform context checking when creating
the document, and they are more likely to do this successfully
if the regular expressions are kept simple." [Annex H]
I realize that many people disagree with this rationale,
but keep in mind that those of us who understand the issue
well enough to disagree with it are not the sort of human 
beings that Annex H is talking about in the first place.)

> >Also note that checking for ambiguity is straightforward --
> >as long as there are no '&' groups.
> I agree. But it's not something you can look up in a book.

So we put it in the XML spec.  A formal definition of
what isn't allowed should suffice, as long as the language
is mathematical in nature (a la 10179) and not legalese
(a la 8879).

 * * *

> >That only works if the content model is unambiguous:
> >consider  '( (a,(b|c)) & (a,(b|d)) )' after seeing 'ab...'
> This seems to me to support removing the unambiguity distinction.

I think you misunderstood.  The algorithm you suggested
for parsing AND groups -- set a flag after matching each
subgroup -- only works for non-ambiguous content models.
Otherwise you can't tell "which" subgroup was matched
("which" in quotes since the question is meaningless for
general regular expressions).  To me that suggests that
the ambiguity restriction should be _retained_: it makes
AND groups easier to handle.

 * * *

> >[1] Final note: OMITTAG, in particular start-tag omission,
> >*cannot work* unless content models are unambiguous and deterministic.
> >This is because of the way "contextually require element" is defined in
> >the standard.
> More reason to drop unambiguity, as that feature died an unlamented death
> practically at the inception of the project.
> You have convinced me though, that both features should be dropped, & and
> unambiguity.

I really must work on my exposition then, since that's
the exact opposite of what I had hoped to convince you :-)

I think XML should keep '&' (since it's useful, despite being
hard to implement); and it must keep the ambiguity restriction
(for compatibility with 8879).  OMITTAG means that 8879:199X 
will most likely not be able to allow ambiguity either, so XML can't
even remove the restriction in anticipation of a revised standard. 

--Joe English

Received on Sunday, 20 October 1996 08:41:31 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 20:25:04 UTC