- From: Hugo Mills <hugo@carfax.org.uk>
- Date: Tue, 7 Mar 2023 13:58:41 +0000
- To: Martynas Jusevičius <martynas@atomgraph.com>
- Cc: Semantic Web <semantic-web@w3.org>
On Tue, Mar 07, 2023 at 02:26:43PM +0100, Martynas Jusevičius wrote: > I found an answer from years ago saying "you can convert quads to > triples with sed/perl", but no actual example on how to do it. Does > anyone have such a script, ideally as shell-native as possible, > without additional dependencies? > > I've tried Jena's riot command, It doesn't do what I need because when > reading quads and writing triples it writes the default graph, which > is empty. > > Currently I'm using a CONSTRUCT query and Jena's sparql command, but > it's rather slow on large files. What I've found most useful when doing basic RDF processing in the shell is to convert everything to N-Triples or N-Quads at the start, and then convert back afterwards. I usually use rapper for this -- although, like most RDF tooling, it has an annoying habit of trying to load everything into RAM, which limits the size of the input data files. Somewhere I've got a small python tool I wrote that can split a Turtle file into smaller self-contained files on a purely syntactic basis, which makes it easier to convert to N-Triples. Once you've got your data in one line per triple, it's then much easier to deal with the RDF data using shell tools. But be very careful of string literals with embedded newlines -- those aren't easy to deal with in basic shell tools. There's a few interesting solutions I've come up with, like using rev|cut|rev to reliably and simply pull out things from the end of each line (e.g. the graph in N-Quads). In general, I've found this kind of thing to be useful for simple ad-hoc data-mangling or data-analysis, but it's not an approach I'd adopt for an actual repeatable data pipeline. Hugo. -- Hugo Mills | Reading Mein Kampf won't make you a Nazi. Reading hugo@... carfax.org.uk | Das Kapital won't make you a communist. But most http://carfax.org.uk/ | trolls started out with a copy of Lord of the Rings. PGP: E2AB1DE4 |
Received on Tuesday, 7 March 2023 14:00:11 UTC