- From: Fredrik Öhrström <oehrstroem@gmail.com>
- Date: Sat, 25 Jan 2025 13:39:09 +0100
- To: Bethan Tovey-Walsh <bytheway@linguacelta.com>
- Cc: ixml <public-ixml@w3.org>
- Message-ID: <CALZT+jC1VNQL=2uju+TWXL6c8R7c9YNaj6xVVFRoyTfsPn9g8g@mail.gmail.com>
> > Another tip that I think might be useful for grammar writing: if you’re > treating spaces as separators between nonterminals, it’s a good idea to > exclude spaces from appearing at the beginning and end of nonterminal > definitions which are made up entirely of terminals. > Thank you Bethan! This is very useful! Here is another useful instruction that I received from Steven. When parsing a language where spaces always can be inserted between tokens, but sometimes they are not needed, alas sometimes they are needed, then you can store the state of the parse as intermedite rules/nodes that are removed using the - marker. Lets pick this sentence: abc123"hello"again parses identically as abc 123 "hello" again however the string "hello" "world" must have the space in the middle because "hello""world" means the content 'hello"world' ie double " is an escape. Here is such a grammar: (store in seq.ixml) data = s?, nodestart, s?. -nodestart = quotestart | namestart | numstart. -quotestart = quote | quote, s, nodestart | quote, (namestart | numstart). -namestart = name | name, s, nodestart | name, (numstart | quotestart). -numstart = num | num, s, nodestart | num, (namestart | quotestart). name = [L]+. num = [N]+. quote = '""' | '"', '""'+, '"' | '"', (~['"']+)++'""', '""'?, '"' | '"""', (~['"']+)++'""', '""'?, '"'. -s = -[' ';#a;#d]+. coffeepot -g:seq.ixml 'abc123"hello" "hello"again' | xmllint --format - prints: <data> <name>abc</name> <num>123</num> <quote>"hello"</quote> <quote>"hello"</quote> <name>again</name> </data> You can see fr the quotestart rule how the nodestart follows after s, whereas if no space follows a quote, then only namestart and numstart might follow. Without these start rules and you try to use s?, then you will get all sorts of ambiguities, where the name again can be a,gain ag,ain aga,in agai,n again //Fredrik //Fredrik
Received on Saturday, 25 January 2025 12:39:40 UTC