- From: Henry Story <henry.story@bblfish.net>
- Date: Sat, 11 Feb 2012 00:05:58 +0100
- To: WebID XG <public-xg-webid@w3.org>, Linked Data community <public-lod@w3.org>
Hi, I have been working on getting a non blocking parsers to work. The point of that is that when you fetch RDF from the web you want to use as few resources as possible. If possible one should only use a few k of memory even for files that are 1GB long. Async parsing allows one to have 1000s of open connections simultaneously one only a few threads, also saving on thread costs (0.5-1MB per thread) For more on what asycn parsing allows one to do see the Jena bug report [1] I got an async rdf/xml parser going last week using Jena, and wrote a full NTriples one too. This one using a powerful scala library called nomo . Then this week Alex Bertails published a Scala library that should allow us to write code to both Jena and Sesame in Scala with very little overhead. It's called "pimp-my-rdf" [2] So here are some pointers: - the RDF/XML parser is using the Jena parser but adapted to non blocking. https://dvcs.w3.org/hg/read-write-web/file/d9c1f87eee55/src/main/scala/cache/WebFetcher.scala - The NTriples Parser written from scratch is here https://github.com/betehess/pimp-my-rdf/blob/master/n-triples-parser/src/main/scala/Parser.scala It should not be that difficult to write a Turtle parser next. So hopefully I should have that working soon too. Henry [1] More on the Jena bug report https://issues.apache.org/jira/browse/JENA-203 [2] https://github.com/betehess/pimp-my-rdf Btw. notice how simple the RDF model is when expressed in Scala https://github.com/betehess/pimp-my-rdf/blob/master/core/src/main/scala/RDF.scala Social Web Architect http://bblfish.net/
Received on Friday, 10 February 2012 23:06:28 UTC