N3.js 0.4.0 with TriG / N-Triples / N-Quads compatibility

Dear all,

Yesterday, I released version 0.4.0 of the N3.js library,
which, in addition to Turtle, now also parses and writes
N-Triples, and the quad formats TriG and N-Quads.
This message gives some insight in how it was build.
Source code at https://github.com/RubenVerborgh/N3.js.


RDF 1.1 SERIALIZATIONS INTRODUCTION

If you are not entirely familiar with the RDF 1.1 specifications,
below is a preview of what the 4 syntaxes look like.

# N-Triples
  <http://ex.org/#Tom> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://ex.org/#Cat>.
  <http://ex.org/#Tom> <http://www.ex.org/#label> "Tom".

# Turtle
  @prefix ex: <http://ex.org/#>.
  ex:Tom a ex:Cat;
           ex:label "Tom".

# N-Quads
  <http://ex.org/#Tom> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://ex.org/#Cat> <http://ex.org/cartoons>.
  <http://ex.org/#Tom> <http://www.ex.org/#label> "Tom" <http://ex.org/cartoons>.

# TriG
  @prefix ex: <http://ex.org/#>.
  ex:cartoons {
    ex:Tom a ex:Cat;
           ex:label "Tom".
  }

As you can see, N-Triples and Turtle encode triples;
N-Quads and TriG encode quads, i.e., triples with a graph.
Also, Turtle is a superset of N-Triples: anything that is valid N-Triples is valid Turtle.
TriG is also a superset of Turtle, and N-Quads is a superset of N-Triples.
Unfortunately, TriG is not a superset of N-Quads,
as TriG and N-Quads use different syntaxes to encode graphs.

Every serialization format comes with positive and negative tests
to verify whether a parser is compatible: http://www.w3.org/TR/rdf11-testcases/


IMPLEMENTATION IN N3.JS

Version 0.3.0 of N3.js provided a streaming Turtle parser,
consisting of a lexer component and an actual parser component.
Lexer and parser have been written exclusively by hand,
in order to realize the streaming behavior at maximum performance.
As a result, simply writing 3 new parsers was not a viable option.
Note that, even though Turtle is a superset of N-Triples,
a spec-compatible N-Triples parser must reject non-N-Triples documents.

Instead, I opted to write a lexer and parser that recognize
the (artificial) superset of TriG and N-Quads;
that is, the entire example document in this message
can be converted into quads by the parser,
even though there is no standard that encompasses it.

In order to pass the specification tests,
I then added restricted modes to the parser for each subset.
For instance, for both N-Triples and N-Quads,
the lexer errors for '@prefix' tokens.
That way, I can maintain one single, fast codebase.

One could of course argue that the N-Triples / N-Quads syntax
can be lexed and parsed more easily (line by line),
and thus faster than with such a hybrid parser.
Yet in my experience, parsing an N-Triples document
is slower than its Turtle equivalent (if all features are used),
because Turtle documents simply contain less characters.
So the speed we could gain by a simple dedicated parser
is probably lost anyway due to the size difference.


WHAT THIS MEANS FOR RDF-JS DEVELOPERS

Until now, the only way to deal with multiple RDF graphs in JavaScript
was using the JSON-LD format, since Turtle doesn't have quads.
N3.js 0.4.0 brings direct access to quad-centric formats.


Enjoy!

Ruben

Received on Friday, 2 January 2015 08:58:40 UTC