Re: MicroXML parser in JavaScript from James Fuller on 2012-09-24 (public-microxml@w3.org from September 2012)

From: James Fuller <jim@webcomposite.com>
Date: Mon, 24 Sep 2012 14:43:39 +0200
To: James Clark <jjc@jclark.com>
Cc: public-microxml@w3.org
Message-ID: <CAEaz5mtdeT000wea_5PxZKqqU1aU4viRpO-3OaGTm38N=w4a3A@mail.gmail.com>

On Mon, Sep 24, 2012 at 2:38 PM, James Clark <jjc@jclark.com> wrote:
> REx looks quite cool.  Did you have to modify the grammar in the spec at all
> to get REx to accept it?

only slightly, and rearrange things to fit what is required by REx
(though I think I need to understand a bit more how REx works with
whitespace def).

> There are very few requirements that aren't expressed in the syntax:
>
> - name in end-tag must match name in start-tag
> - no duplicate attributes
> - referent of a numeric character ref must match char production

good points!

> What difference exactly in the behaviour of the parser does <?TOKENS?> make?

The preceding <?TOKENS?> is the syntax ( parser rules) which is
subject to LL(K) parser generation.

The part following <?TOKENS?> is the 'lexer definition', which goes
into a DFA construction.

The following constructs are allowed in lexer definition:

  - character codes, e.g. #xFEFF
  - character sets, e.g. [0-9a-fA-F]
  - "subtraction", e.g. char - ('<'|'&'|'>')
  - lexical lookahead ("&" and "\\" operators)
  - token preference definitions ("<<" and ">>" operators)

As per Gunther, he states that the lexer definition does not support;

  - recursive rules (because of the DFA)
  - the "ordered choice" operator: "/"

Yes, REx is cool …

Jim Fuller

Received on Monday, 24 September 2012 12:44:12 UTC