- From: James Fuller <jim@webcomposite.com>
- Date: Mon, 24 Sep 2012 14:43:39 +0200
- To: James Clark <jjc@jclark.com>
- Cc: public-microxml@w3.org
On Mon, Sep 24, 2012 at 2:38 PM, James Clark <jjc@jclark.com> wrote:
> REx looks quite cool. Did you have to modify the grammar in the spec at all
> to get REx to accept it?
only slightly, and rearrange things to fit what is required by REx
(though I think I need to understand a bit more how REx works with
whitespace def).
> There are very few requirements that aren't expressed in the syntax:
>
> - name in end-tag must match name in start-tag
> - no duplicate attributes
> - referent of a numeric character ref must match char production
good points!
> What difference exactly in the behaviour of the parser does <?TOKENS?> make?
The preceding <?TOKENS?> is the syntax ( parser rules) which is
subject to LL(K) parser generation.
The part following <?TOKENS?> is the 'lexer definition', which goes
into a DFA construction.
The following constructs are allowed in lexer definition:
- character codes, e.g. #xFEFF
- character sets, e.g. [0-9a-fA-F]
- "subtraction", e.g. char - ('<'|'&'|'>')
- lexical lookahead ("&" and "\\" operators)
- token preference definitions ("<<" and ">>" operators)
As per Gunther, he states that the lexer definition does not support;
- recursive rules (because of the DFA)
- the "ordered choice" operator: "/"
Yes, REx is cool …
Jim Fuller
Received on Monday, 24 September 2012 12:44:12 UTC