- From: David Robillard <d@drobilla.net>
- Date: Thu, 15 Dec 2011 14:30:11 -0500
- To: Andy Seaborne <andy.seaborne@epimorphics.com>
- Cc: public-rdf-comments@w3.org
On Thu, 2011-12-15 at 18:44 +0000, Andy Seaborne wrote: [...] > You may be able to tokenize on single characters, and build a grammar > for the language based on that, but madness may be the result. It > certainly isn't the grammar for the language in the spec. Actually, except for the few cases mentioned, such a parser is simple, fast, and quite elegant. You can just fly through input characters, always knowing precisely which rule you are working on. Another way of putting it would be you can parse the language using only a peekchar() and readchar() function and a 1 character (rather than 1 token) buffer. Such languages are luxurious to parse manually, unfortunately Turtle isn't *quite* such a language. I sought to implement a high performance dependency-free C implementation[1], and for better or worse that's how it turned out. The diversions from the grammar as written are things like: // Spec: [6] triples ::= subject predicateObjectList // Impl: [6] triples ::= subject ws+ predicateObjectList As mentioned in my other reply, though, this stems from having implemented an earlier revision of the spec which was very ambiguous with respect to whitespace (it was a trial and error game to figure out where the magic ws+ and ws* had to go). I will try to rework it to take advantage of the new grammar and see what results. -dr [1] http://drobilla.net/software/serd/
Received on Sunday, 18 December 2011 12:22:34 UTC