W3C home > Mailing lists > Public > public-rdf-wg@w3.org > June 2013

TriG is about semicolons and dots, delimited by occasional RDF data.

From: Ivan Mikhailov <imikhailov@openlinksw.com>
Date: Thu, 20 Jun 2013 03:57:09 +0700
Message-ID: <1371675429.3442.1105.camel@octo.iv.dev.null>
To: RDF-WG <public-rdf-wg@w3.org>
I've got a three days of entertainment trying to write a parser, such
that

1. The parser is written as a Flex+Bison, with no "slow" Flex features,
no use of Flex stack and no "wrong side" recursive rules like "Z is
either X or sequence X Y Z". (Can't load a Teratriple with a slow parser
or with a parser that will put incomplete sequences to the stack.)
2. The parser is able to read "Turtle or TriG or N3 or NTriples or some
old Turtle/TriG dialect" without strict indication of the type of the
document.
3a. The parser makes meaningful error messages for all typical errors,
so one should write a really unusual garbage to get a generic "syntax
error at line N".
3b. The parser has configurable error recovery; if used for validation
it provides an accurate list of syntax errors (say, first 300 errors),
not reports the very first one and stops.

I've failed to make it in a reasonable time. My previous version worked
quite reasonably with "draft" Turtle and "early" TriG that is exactly
"many Turtles in curve braces". Now I can achieve only properties 1+2 or
1+3 instead of desired 1+2+3 whereas the .y file comes close to 200
rules and 250 states, so I will fallback to 2+3.

An optional dot at the end of list is ok, two semicolons after each
other are ok, @prefix x: <y> with dot at the end and PREFIX x: <y>
without dot at the end are both ok, etc. It's quite ok to relax the
grammar in any single ways. If it becomes relaxed in _all_ of these ways
at the same time then Flex is not as flexible and Bison is not as given
to butting as needed :) It's more practical to chose between good error
diagnostics/recovery and the use of Flex+Bison.

I don't propose to tighten nuts back to early "strict" dialects. I don't
propose to keep the spec in its current form. I don't care.
My .ttl/.trig "printer" is very conservative and will be compatible with
any reader. My parser will deal with any variant of spec approved by
W3C, and even with recovery. I just want to warn other implementors.

Best Regards,

Ivan Mikhailov
OpenLink Software
http://virtuoso.openlinksw.com

P.S. Re-definition of namespace prefix with a different namespace IRI is
a cherry on a cake.
Received on Wednesday, 19 June 2013 20:57:34 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 19 June 2013 20:57:34 UTC