Re: API for loading datasets; was Re: TriG being disjoint from Turtle

On 05/21/2013 10:33 AM, Sandro Hawke wrote:
>
>
> == Design-3 ==
>
> use a different operation:
>
>     load_dataset acts like in design-1, but hands back the list of all 
> graphs created.  That list has to be handed to unload_dataset, so no 
> private internal storage is needed.
>
> I'd also provide load_dataset_safe or a "safe=True" option on 
> load_dataset which makes it behave like design-2 -- putting everything 
> in newly named graphs.    I'd probably return a structure giving the 
> mapping between the names used in the source and skNNN names assigned, 
> rather than put that into the quadstore.
>
> Maybe load_dataset is called load_multiple, and it can optionally take 
> a list of sources.  Maybe it could even do some crawling while it's 
> loading.  In either case, it'd have the same API options as 
> load_dataset above, I think.
>
> == == ==
>
> Okay, I'm pretty happy with design-3.   What do you think?
Where does the expectation that unload_dataset is possible come from?  
I've got an assumption that anonymous graphs cannot be distinguished 
from other anonymous graphs once they've been loaded. It seems risky to 
divert from this assumption.

Is it possible that, rather than requiring a 'safe' operation, that 
you've found another compelling use case for labeling graphs with blank 
node identifiers?  These identifiers would have to be skolemized by the 
target dataset into unique, and hence safe/unloadable graphs.

Charles
>
>           -- Sandro
>
>
>
>>     Cheers --- Jan
>>
>> P.s.    still hoping for an
>>         @format <http://www.w3.org/TR/2013/CR-turtle-20130219/> .
>>     or similar.
>>
>>
>>
>
>

-- 
Charles Greer
Senior Engineer
MarkLogic Corporation
charles.greer@marklogic.com
Phone: +1 707 408 3277
www.marklogic.com

Received on Tuesday, 21 May 2013 17:52:32 UTC