Re: C.10 Allow nondeterministic content models? from James Clark on 1996-10-20 (w3c-sgml-wg@w3.org from October 1996)

From: James Clark <jjc@jclark.com>
Date: Sun, 20 Oct 1996 16:13:56 +0000
To: Joe English <jenglish@crl.com>
Cc: W3C SGML Working Group <w3c-sgml-wg@w3.org>
Message-Id: <2.2.32.19961020161356.009e6ea4@jclark.iserver.com>

At 06:50 20/10/96 -0700, Joe English wrote:
>
>Tim Bray <tbray@textuality.com> wrote:
>
>> At 11:46 AM 18/10/96 -0700, Joe English wrote:
>> >
>> >In fact, the ambiguity rule can make things *easier* for implementors.)
>> 
>> Say what?  The algorithms necessary to detect ambiguity are not typically
>> within the repertoire of the hypothetical CS bachelor's-degree type that
>> we'd like to be able to construct a validating parser.
>
>True: the ambiguity restriction makes it harder to validate DTDs.

If you don't have &-groups, detecting non-determinism (I think that's a much
less confusing term than ambiguity) is pretty trivial.  In fact it you use
one obvious algorithm for checking instances, non-determinism checking can
be done in, I would guess, less than 10 lines of code.  The obvious
algorithm I have in mind is a simplification of Algorithm 3.5 in the Dragon
Book (p140 in my copy), which is the algorithm for constructing a DFA from a
regexp.  The simplification is that instead of the complicated final step
where you do the subset construction, you simply read off the DFA directly
from the followpos() sets.  While you're doing that, all you have to do to
check non-determinism is to check each followpos set to see if it includes
two distinct positions labeled with the same symbol (ie generic identifier).

James

Received on Sunday, 20 October 1996 11:20:05 UTC