Re: API for loading datasets; was Re: TriG being disjoint from Turtle from Sandro Hawke on 2013-05-21 (public-rdf-comments@w3.org from May 2013)

From: Sandro Hawke <sandro@w3.org>
Date: Tue, 21 May 2013 14:48:50 -0400
To: Charles Greer <cgreer@marklogic.com>
CC: public-rdf-comments@w3.org
Message-ID: <519BC192.2040109@w3.org>

On 05/21/2013 01:52 PM, Charles Greer wrote:
>
> On 05/21/2013 10:33 AM, Sandro Hawke wrote:
>>
>>
>> == Design-3 ==
>>
>> use a different operation:
>>
>>     load_dataset acts like in design-1, but hands back the list of 
>> all graphs created.  That list has to be handed to unload_dataset, so 
>> no private internal storage is needed.
>>
>> I'd also provide load_dataset_safe or a "safe=True" option on 
>> load_dataset which makes it behave like design-2 -- putting 
>> everything in newly named graphs.    I'd probably return a structure 
>> giving the mapping between the names used in the source and skNNN 
>> names assigned, rather than put that into the quadstore.
>>
>> Maybe load_dataset is called load_multiple, and it can optionally 
>> take a list of sources.  Maybe it could even do some crawling while 
>> it's loading.  In either case, it'd have the same API options as 
>> load_dataset above, I think.
>>
>> == == ==
>>
>> Okay, I'm pretty happy with design-3.   What do you think?
> Where does the expectation that unload_dataset is possible come from?

Jan wanted it.

I think one of the big wins of datasets is they let folks load/unload 
graphs.

So I understand wanting to have that for datasets themselves, too.

> I've got an assumption that anonymous graphs cannot be distinguished 
> from other anonymous graphs once they've been loaded. It seems risky 
> to divert from this assumption.
>

I agree if you merge graphs (without using named graphs), then you can't 
practically "unload".

I could understand not providing an unload for datasets, thinking that 
would require quints, but I think my renaming trick also gets us there 
reasonably enough.

> Is it possible that, rather than requiring a 'safe' operation, that 
> you've found another compelling use case for labeling graphs with 
> blank node identifiers?  These identifiers would have to be skolemized 
> by the target dataset into unique, and hence safe/unloadable graphs.
>

Oh, that's nice!    Very elegant.    Tie in to ISSUE-131.

Alas, I think my "safe" operation would still be wanted because not all 
datasets will be like that.

But it does remind me of Linked Data best practice (according to Sandro 
on 21 May 2013):

   1.  Use existing IRIs for things, if they're good.
   2.  Create IRIs to name things if no good ones exist and you can make 
good ones
   3.  DONT create bad IRIs for things (ones that you don't even plan to 
support)

Making people use IRIs to name graphs will probably often force them 
violate point 3.

         -- Sandro

> Charles
>>
>>           -- Sandro
>>
>>
>>
>>>     Cheers --- Jan
>>>
>>> P.s.    still hoping for an
>>>         @format <http://www.w3.org/TR/2013/CR-turtle-20130219/> .
>>>     or similar.
>>>
>>>
>>>
>>
>>
>

Received on Tuesday, 21 May 2013 18:49:05 UTC