- From: Sandro Hawke <sandro@w3.org>
- Date: Tue, 21 May 2013 13:33:03 -0400
- To: Jan Wielemaker <J.Wielemaker@vu.nl>
- CC: Andy Seaborne <andy.seaborne@epimorphics.com>, public-rdf-comments@w3.org
- Message-ID: <519BAFCF.1000403@w3.org>
On 05/17/2013 08:09 AM, Jan Wielemaker wrote:
> Hi Sandro,
>
> On 05/17/2013 01:38 PM, Sandro Hawke wrote:
>> On 05/17/2013 06:00 AM, Jan Wielemaker wrote:
>>> On 05/17/2013 11:49 AM, Andy Seaborne wrote:
>>>
>>> [this fragment is from Charles Greer, not answered by Andy]
>>>
>>>> 1. Could the spec be modified to allow TriG to be a superset of
>>>> turtle? Specifically, could the production rules be modified to allow
>>>> a set of triples outside of any '{' '}' to be the same as triples
>>>> in a
>>>> default anonymous graph? It seems that even now, the rules allow
>>>> multiple anonymous graph productions, whose union would be the unnamed
>>>> graph. It would be convenient if we could dispense with these
>>>> anonymous
>>>> curly braces altogether if possible.
>>>
>>> Having implemented TriG yesterday on top of the Turtle parser, I must
>>> say that I was happily surprised that TriG does not allow for triples
>>> outside {}. This means you can detect whether a document is a Turtle
>>> or TriG document at the first triple.
>>
>> Why do you want to do that? I'm imagining a world where people load
>> data by URL, not necessarily knowing if it's going to have named graphs
>> in it.
>>
>> I'd think in a load_graph operation, you'd accept TriG as well, using
>> the default graph as the output graph. Maybe have a flag about whether
>> to ignore or raise on error if there are some named graphs as well.
>>
>> And in a load_dataset operations, I'd think you'd accept Turtle as well,
>> and just not get any named graphs out of it.
>
> I am not yet sure. Having to deal with files, loading of which can
> create or extend multiple graphs is something new in the design of
> SWI-Prolog's RDF store. There are two things for which I do not yet
> have a good answer: implementing `unloading' the data and dealing with
> the persistent backup.
>
> The system currently loads a source into a named graph named after the
> source. After loading, the graph is saved in a fast and compact binary
> format into a file named after the graph-name. Subsequent modifications
> are saved in a `journal' file, also named after the graph-name.
> Unloading a source finds the graph, removes all triples from memory and
> deletes the backup files.
>
(Yes, I have fond memories of using swipl.)
> This schema won't fly easily with TriG files. TriG files can create
> multiple graphs and/or add triples to multiple graphs. TriG files are
> also likely to change the granularity of named graphs, which makes the
> file-per-named-graph backup module inadequate. I don't know yet how I'm
> going to solve that, but I think it is likely that knowing beforehand
> that I'm dealing with a TriG file will be useful information.
>
Interesting problem. Brainstorming a bit....
== Design-1 ==
Treat a TriG file as set of Turtle files. User loads x.trig
{ <s> <p> 1 }
<g1> { <s> <p> 1,2 }
so you treat that as if they loaded a turtle file called "x.trig"
<s> <p> 1
and a turtle file called "g1"
<s> <p> 1,2
You cache and back them up just like that. Somewhere internally you
remember that unloading trig.x really means to also unload g1.
== Design-2 ==
Explicit metadata. User loads x.trig and ends up with a new graph
called "x.trig" containing triples like:
<x.trig> ds:defaultGraph <sk01>
<g1> ds:nameFor <sk02>
and then graph <sk01> has the default graph triples in it, while <sk02>
has the g1 triples in it. <sk01> and <sk02> are system generated
graph names, or could be blank nodes if that's something you support.
Now unloading doesn't need to remember anything internally. When you
unload a graph, if is has ds:defaultGraph or ds:nameFor triples in it,
you unload the graphs named after the objects of those triples as well.
== Design-3 ==
use a different operation:
load_dataset acts like in design-1, but hands back the list of all
graphs created. That list has to be handed to unload_dataset, so no
private internal storage is needed.
I'd also provide load_dataset_safe or a "safe=True" option on
load_dataset which makes it behave like design-2 -- putting everything
in newly named graphs. I'd probably return a structure giving the
mapping between the names used in the source and skNNN names assigned,
rather than put that into the quadstore.
Maybe load_dataset is called load_multiple, and it can optionally take a
list of sources. Maybe it could even do some crawling while it's
loading. In either case, it'd have the same API options as load_dataset
above, I think.
== == ==
Okay, I'm pretty happy with design-3. What do you think?
-- Sandro
> Cheers --- Jan
>
> P.s. still hoping for an
> @format <http://www.w3.org/TR/2013/CR-turtle-20130219/> .
> or similar.
>
>
>
Received on Tuesday, 21 May 2013 17:33:20 UTC