- From: Sandro Hawke <sandro@w3.org>
- Date: Tue, 21 May 2013 13:33:03 -0400
- To: Jan Wielemaker <J.Wielemaker@vu.nl>
- CC: Andy Seaborne <andy.seaborne@epimorphics.com>, public-rdf-comments@w3.org
- Message-ID: <519BAFCF.1000403@w3.org>
On 05/17/2013 08:09 AM, Jan Wielemaker wrote: > Hi Sandro, > > On 05/17/2013 01:38 PM, Sandro Hawke wrote: >> On 05/17/2013 06:00 AM, Jan Wielemaker wrote: >>> On 05/17/2013 11:49 AM, Andy Seaborne wrote: >>> >>> [this fragment is from Charles Greer, not answered by Andy] >>> >>>> 1. Could the spec be modified to allow TriG to be a superset of >>>> turtle? Specifically, could the production rules be modified to allow >>>> a set of triples outside of any '{' '}' to be the same as triples >>>> in a >>>> default anonymous graph? It seems that even now, the rules allow >>>> multiple anonymous graph productions, whose union would be the unnamed >>>> graph. It would be convenient if we could dispense with these >>>> anonymous >>>> curly braces altogether if possible. >>> >>> Having implemented TriG yesterday on top of the Turtle parser, I must >>> say that I was happily surprised that TriG does not allow for triples >>> outside {}. This means you can detect whether a document is a Turtle >>> or TriG document at the first triple. >> >> Why do you want to do that? I'm imagining a world where people load >> data by URL, not necessarily knowing if it's going to have named graphs >> in it. >> >> I'd think in a load_graph operation, you'd accept TriG as well, using >> the default graph as the output graph. Maybe have a flag about whether >> to ignore or raise on error if there are some named graphs as well. >> >> And in a load_dataset operations, I'd think you'd accept Turtle as well, >> and just not get any named graphs out of it. > > I am not yet sure. Having to deal with files, loading of which can > create or extend multiple graphs is something new in the design of > SWI-Prolog's RDF store. There are two things for which I do not yet > have a good answer: implementing `unloading' the data and dealing with > the persistent backup. > > The system currently loads a source into a named graph named after the > source. After loading, the graph is saved in a fast and compact binary > format into a file named after the graph-name. Subsequent modifications > are saved in a `journal' file, also named after the graph-name. > Unloading a source finds the graph, removes all triples from memory and > deletes the backup files. > (Yes, I have fond memories of using swipl.) > This schema won't fly easily with TriG files. TriG files can create > multiple graphs and/or add triples to multiple graphs. TriG files are > also likely to change the granularity of named graphs, which makes the > file-per-named-graph backup module inadequate. I don't know yet how I'm > going to solve that, but I think it is likely that knowing beforehand > that I'm dealing with a TriG file will be useful information. > Interesting problem. Brainstorming a bit.... == Design-1 == Treat a TriG file as set of Turtle files. User loads x.trig { <s> <p> 1 } <g1> { <s> <p> 1,2 } so you treat that as if they loaded a turtle file called "x.trig" <s> <p> 1 and a turtle file called "g1" <s> <p> 1,2 You cache and back them up just like that. Somewhere internally you remember that unloading trig.x really means to also unload g1. == Design-2 == Explicit metadata. User loads x.trig and ends up with a new graph called "x.trig" containing triples like: <x.trig> ds:defaultGraph <sk01> <g1> ds:nameFor <sk02> and then graph <sk01> has the default graph triples in it, while <sk02> has the g1 triples in it. <sk01> and <sk02> are system generated graph names, or could be blank nodes if that's something you support. Now unloading doesn't need to remember anything internally. When you unload a graph, if is has ds:defaultGraph or ds:nameFor triples in it, you unload the graphs named after the objects of those triples as well. == Design-3 == use a different operation: load_dataset acts like in design-1, but hands back the list of all graphs created. That list has to be handed to unload_dataset, so no private internal storage is needed. I'd also provide load_dataset_safe or a "safe=True" option on load_dataset which makes it behave like design-2 -- putting everything in newly named graphs. I'd probably return a structure giving the mapping between the names used in the source and skNNN names assigned, rather than put that into the quadstore. Maybe load_dataset is called load_multiple, and it can optionally take a list of sources. Maybe it could even do some crawling while it's loading. In either case, it'd have the same API options as load_dataset above, I think. == == == Okay, I'm pretty happy with design-3. What do you think? -- Sandro > Cheers --- Jan > > P.s. still hoping for an > @format <http://www.w3.org/TR/2013/CR-turtle-20130219/> . > or similar. > > >
Received on Tuesday, 21 May 2013 17:33:20 UTC