W3C home > Mailing lists > Public > public-lod@w3.org > February 2012

non blocking RDF/XML and NTriples Parser in Scala (and Java)

From: Henry Story <henry.story@bblfish.net>
Date: Sat, 11 Feb 2012 00:05:58 +0100
Message-Id: <011AF8A2-F91B-4843-9492-B194D9356DBC@bblfish.net>
To: WebID XG <public-xg-webid@w3.org>, Linked Data community <public-lod@w3.org>
Hi,

	I have been working on getting a non blocking parsers to work. The point of that is that 
when you fetch RDF from the web you want to use as few resources as possible. If possible one should 
only use a few k of memory even for files that are 1GB long. Async parsing allows one to have 1000s
of open connections simultaneously one only a few threads, also saving on thread costs (0.5-1MB per
thread) For more on what asycn parsing allows one to do see the Jena bug report [1] 

	I got an async rdf/xml parser going last week using Jena, and wrote a full NTriples one too. 
This one using a powerful scala library called nomo . Then this week Alex Bertails published a 
Scala library that should allow us to write code to both Jena and Sesame in Scala with very little 
overhead. It's called "pimp-my-rdf" [2]

So here are some pointers:

  - the RDF/XML parser is using the Jena parser but adapted to non blocking.
     https://dvcs.w3.org/hg/read-write-web/file/d9c1f87eee55/src/main/scala/cache/WebFetcher.scala
  - The NTriples Parser written from scratch is here
     https://github.com/betehess/pimp-my-rdf/blob/master/n-triples-parser/src/main/scala/Parser.scala


It should not be that difficult to write a Turtle parser next. So hopefully I should have that
working soon too.


Henry

[1] More on the Jena bug report
   https://issues.apache.org/jira/browse/JENA-203
[2] https://github.com/betehess/pimp-my-rdf 
    Btw. notice how simple the RDF model is when expressed in Scala
    https://github.com/betehess/pimp-my-rdf/blob/master/core/src/main/scala/RDF.scala


Social Web Architect
http://bblfish.net/
Received on Friday, 10 February 2012 23:06:28 UTC

This archive was generated by hypermail 2.3.1 : Sunday, 31 March 2013 14:24:37 UTC