Re: Beginners' errors

>
> Another tip that I think might be useful for grammar writing: if you’re
> treating spaces as separators between nonterminals, it’s a good idea to
> exclude spaces from appearing at the beginning and end of nonterminal
> definitions which are made up entirely of terminals.
>

Thank you Bethan! This is very useful!

Here is another useful instruction that I received from Steven.

When parsing a language where spaces always can be inserted between tokens,
but sometimes they are not needed, alas sometimes they are needed, then you
can store the state of the parse as intermedite rules/nodes that are
removed using the - marker.

Lets pick this sentence:
abc123"hello"again
parses identically as
abc 123  "hello"    again

however the string
"hello"   "world"
must have the space in the middle because
"hello""world" means the content 'hello"world'
ie double " is an escape.

Here is such a grammar: (store in seq.ixml)

data = s?, nodestart, s?.
-nodestart = quotestart | namestart | numstart.
-quotestart = quote
            | quote, s, nodestart
            | quote, (namestart | numstart).
-namestart = name
           | name, s, nodestart
           | name, (numstart | quotestart).
-numstart = num
          | num, s, nodestart
          | num, (namestart | quotestart).
name  = [L]+.
num   = [N]+.
quote = '""'
      | '"', '""'+, '"'
      | '"', (~['"']+)++'""', '""'?, '"'
      | '"""', (~['"']+)++'""', '""'?, '"'.
-s    = -[' ';#a;#d]+.

coffeepot -g:seq.ixml 'abc123"hello" "hello"again' | xmllint --format -

prints:
<data>
  <name>abc</name>
  <num>123</num>
  <quote>"hello"</quote>
  <quote>"hello"</quote>
  <name>again</name>
</data>

You can see fr the quotestart rule how the nodestart follows after s,
whereas if no space follows a quote, then
only namestart and numstart might follow.

Without these start rules and you try to use s?, then you will get all
sorts of ambiguities, where the name again can be
a,gain
ag,ain
aga,in
agai,n
again

//Fredrik














//Fredrik

Received on Saturday, 25 January 2025 12:39:40 UTC