- From: Ivan Mikhailov <imikhailov@openlinksw.com>
- Date: Thu, 20 Jun 2013 03:57:09 +0700
- To: RDF-WG <public-rdf-wg@w3.org>
I've got a three days of entertainment trying to write a parser, such that 1. The parser is written as a Flex+Bison, with no "slow" Flex features, no use of Flex stack and no "wrong side" recursive rules like "Z is either X or sequence X Y Z". (Can't load a Teratriple with a slow parser or with a parser that will put incomplete sequences to the stack.) 2. The parser is able to read "Turtle or TriG or N3 or NTriples or some old Turtle/TriG dialect" without strict indication of the type of the document. 3a. The parser makes meaningful error messages for all typical errors, so one should write a really unusual garbage to get a generic "syntax error at line N". 3b. The parser has configurable error recovery; if used for validation it provides an accurate list of syntax errors (say, first 300 errors), not reports the very first one and stops. I've failed to make it in a reasonable time. My previous version worked quite reasonably with "draft" Turtle and "early" TriG that is exactly "many Turtles in curve braces". Now I can achieve only properties 1+2 or 1+3 instead of desired 1+2+3 whereas the .y file comes close to 200 rules and 250 states, so I will fallback to 2+3. An optional dot at the end of list is ok, two semicolons after each other are ok, @prefix x: <y> with dot at the end and PREFIX x: <y> without dot at the end are both ok, etc. It's quite ok to relax the grammar in any single ways. If it becomes relaxed in _all_ of these ways at the same time then Flex is not as flexible and Bison is not as given to butting as needed :) It's more practical to chose between good error diagnostics/recovery and the use of Flex+Bison. I don't propose to tighten nuts back to early "strict" dialects. I don't propose to keep the spec in its current form. I don't care. My .ttl/.trig "printer" is very conservative and will be compatible with any reader. My parser will deal with any variant of spec approved by W3C, and even with recovery. I just want to warn other implementors. Best Regards, Ivan Mikhailov OpenLink Software http://virtuoso.openlinksw.com P.S. Re-definition of namespace prefix with a different namespace IRI is a cherry on a cake.
Received on Wednesday, 19 June 2013 20:57:34 UTC