- From: Ruben Verborgh <ruben.verborgh@ugent.be>
- Date: Fri, 2 Jan 2015 09:58:08 +0100
- To: public-rdfjs@w3.org
Dear all, Yesterday, I released version 0.4.0 of the N3.js library, which, in addition to Turtle, now also parses and writes N-Triples, and the quad formats TriG and N-Quads. This message gives some insight in how it was build. Source code at https://github.com/RubenVerborgh/N3.js. RDF 1.1 SERIALIZATIONS INTRODUCTION If you are not entirely familiar with the RDF 1.1 specifications, below is a preview of what the 4 syntaxes look like. # N-Triples <http://ex.org/#Tom> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://ex.org/#Cat>. <http://ex.org/#Tom> <http://www.ex.org/#label> "Tom". # Turtle @prefix ex: <http://ex.org/#>. ex:Tom a ex:Cat; ex:label "Tom". # N-Quads <http://ex.org/#Tom> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://ex.org/#Cat> <http://ex.org/cartoons>. <http://ex.org/#Tom> <http://www.ex.org/#label> "Tom" <http://ex.org/cartoons>. # TriG @prefix ex: <http://ex.org/#>. ex:cartoons { ex:Tom a ex:Cat; ex:label "Tom". } As you can see, N-Triples and Turtle encode triples; N-Quads and TriG encode quads, i.e., triples with a graph. Also, Turtle is a superset of N-Triples: anything that is valid N-Triples is valid Turtle. TriG is also a superset of Turtle, and N-Quads is a superset of N-Triples. Unfortunately, TriG is not a superset of N-Quads, as TriG and N-Quads use different syntaxes to encode graphs. Every serialization format comes with positive and negative tests to verify whether a parser is compatible: http://www.w3.org/TR/rdf11-testcases/ IMPLEMENTATION IN N3.JS Version 0.3.0 of N3.js provided a streaming Turtle parser, consisting of a lexer component and an actual parser component. Lexer and parser have been written exclusively by hand, in order to realize the streaming behavior at maximum performance. As a result, simply writing 3 new parsers was not a viable option. Note that, even though Turtle is a superset of N-Triples, a spec-compatible N-Triples parser must reject non-N-Triples documents. Instead, I opted to write a lexer and parser that recognize the (artificial) superset of TriG and N-Quads; that is, the entire example document in this message can be converted into quads by the parser, even though there is no standard that encompasses it. In order to pass the specification tests, I then added restricted modes to the parser for each subset. For instance, for both N-Triples and N-Quads, the lexer errors for '@prefix' tokens. That way, I can maintain one single, fast codebase. One could of course argue that the N-Triples / N-Quads syntax can be lexed and parsed more easily (line by line), and thus faster than with such a hybrid parser. Yet in my experience, parsing an N-Triples document is slower than its Turtle equivalent (if all features are used), because Turtle documents simply contain less characters. So the speed we could gain by a simple dedicated parser is probably lost anyway due to the size difference. WHAT THIS MEANS FOR RDF-JS DEVELOPERS Until now, the only way to deal with multiple RDF graphs in JavaScript was using the JSON-LD format, since Turtle doesn't have quads. N3.js 0.4.0 brings direct access to quad-centric formats. Enjoy! Ruben
Received on Friday, 2 January 2015 08:58:40 UTC