Re: using con-neg for a genid view, maybe solving the PATCH dilemma from Sandro Hawke on 2014-02-05 (public-ldp-patch@w3.org from February 2014)

From: Sandro Hawke <sandro@w3.org>
Date: Wed, 05 Feb 2014 14:29:49 -0500
To: Joe Presbrey <presbrey@MIT.EDU>
CC: LDP Patch Discussion List <public-ldp-patch@w3.org>, Andrei Sambra <andrei@fcns.eu>
Message-ID: <52F2912D.4080301@w3.org>
[this is a digression, thinking about genid management]

On 02/04/2014 08:34 PM, Joe Presbrey wrote:
> Its commonplace for SQL apps to insert blank rows (without `id`) to 
> master fact tables and use the last_insert_id() from the server (eg. 
> usually uint64 primary auto-increment) as a foreign key in followup 
> metadata.
>
> Similarly, materializing IDs for blanks RDF nodes seems fine, but I do 
> not see the advantages of /.well-known/ prefixing. Doesn't this common 
> URI space create a potential for ID collision? Is that a bug or 
> feature? :)
>

RDF 1.1 Concepts says:

    Systems may wish to mint Skolem IRIs in such a way that they can
    recognize the IRIs as having been introduced solely to replace blank
    nodes. This allows a system to map IRIs back to blank nodes if needed.

    Systems that want Skolem IRIs to be recognizable outside of the
    system boundaries /SHOULD/ use a well-known IRI [RFC5785
    <http://www.w3.org/TR/rdf11-concepts/#bib-RFC5785>] with the
    registered name |genid|.

It seems to me like it would be more useful to know these are genid IRIs 
than to not know that, although I don't have a use case in mind right 
now for knowing it.

We should think about whether TurtlePatch says you MAY, SHOULD, or MUST 
use .well-known/genid.

Mostly I think it'll manifest in UIs, where genid IRIs should probably 
not be shown to users.  It's questionable whether any IRIs should be 
shown to users, though.    So I don't know.   If you're outputting 
Turtle, like for debugging, it'll sure be nice to de-Skolemize for 
users.   Especially for lists.

> Some URI endpoints want to generate <sha1:x> for the blank data coming 
> in. For the SQL backed graph example, perhaps simply <#1>, <#2>, with 
> paging, or even separate graphs per record would be fine.
>
> Is there a recipe I missed for using genids without the ID management 
> overhead? Are you supposed to append the full resource URL to the 
> genid space?

There's no standard.  I'm not seeing a need for one.

I guess the big design question is whether you want the genid IRIs to be 
dereferenceable.   For TurtlePatch, I don't see much need for that, so 
one can just use uuids or a lighter-weight custom version.   So maybe:

http://example.com/.well-known/genid/b873cace-8e99-11e3-bcb0-0024e8b4f183

If you want dereferenceability, one would need some domain-wide 
persistent database mapping non-reusable database ids to the processes 
which can answer for genids starting with that id.  Maybe the database 
id is year-month-servicename, and maybe some service called 'userinfo' 
has sequential blank node ids, so you might end up with:

http://example.com/.well-known/genid/2014-01-userinfo/7021

... the webserver would proxy IRIs starting 
http://example.com/.well-known/genid/2014-01-userinfo/ over to the 
userinfo system so it can serve up some triples which happen to contain 
bnode 7021.    If userinfo only ever has each bnode in one graph, it 
could serve up that graph, I guess.   If it doesn't have that 
restriction, I don't know what it should serve up.

    -- Sandro

>
> -- 
> Joe Presbrey
>
> On Fri, 31 Jan 2014, Sandro Hawke wrote:
>
>> Yesterday, Cody got me thinking about PATCH again and I had an idea 
>> which I think threads the ldp-patch needle nicely.
>>
>> Cody's LDP implementation is using TurtlePatch [1], which you'll 
>> recall is a tiny subset of SPARQL 1.1 Update which is easy to parse 
>> and execute in linear time.   It has no variables and cannot operate 
>> on blank nodes.   So, it's great, as long as you have no blank 
>> nodes.  To me, in recent years, that made it basically useless. But 
>> now I see how to make it okay.
>>
>> My idea was that the server could provide a Skolemized (genid) view 
>> of the data, suitable for easy patching with TurtlePatch, but not 
>> affecting how everyone else interacts with the data. Specifically, 
>> I'm thinking that if the client does a GET with "Accept: 
>> application/turtlepatch" then it gets a patch from the empty graph to 
>> the current graph, with all the blank nodes Skolemized in some way 
>> the server will accept for future PATCH operations. You can also 
>> think about what it gets as slightly-restricted Turtle with three 
>> extra boilerplate lines.
>>
>> For example:
>>
>> GET http://example.org/alice
>> Accept: text/turtle
>>
>>    PREFIX foaf: <http://xmlns.com/foaf/0.1/>
>>    PREFIX : <http://example.org/>
>>    :me foaf:knows [ foaf:name "Bob" ].
>>
>>
>> GET http://example.org/alice
>> Accept: text/turtlepatch
>>
>>    PREFIX foaf: <http://xmlns.com/foaf/0.1/>
>>    PREFIX : <http://example.org/>
>>    PREFIX genid: <http://example.org/.well-known/genid/d7432/>
>>    INSERT DATA {
>>    :me foaf:knows genid:517.
>>    genid:517 foaf:name "Bob".
>>    }
>>
>> The difference here is that text/turtlepatch has the INSERT DATA 
>> boilerplate and is Skolemized.   (It's also restricted to not have 
>> any newlines in string literals, but use \n instead.)  The idea is 
>> that clients can ask for this Skolemized version, and then patch it 
>> easily and efficiently with TurtlePatch.   If they're expecting to be 
>> patching, I imagine they'll generally fetch with turtlepatch instead 
>> of turtle.
>>
>> My sense is this is pretty easy to implement assuming the server (1) 
>> has some kind of access to .well-known/genid URI space on its host 
>> and (2) has stable internal identifiers for blank nodes which it can 
>> expose or efficiently map to/from something it can expose.  (There's 
>> probably a workaround if (1) isn't true.)
>>
>> What do you think?
>>
>> There are other ways one could ask for a skolemized view, of course.  
>> One could have parallel resources, and have Link headers between the 
>> two.   In practice, it might turn into adding a ?genid=true to the 
>> URLs or something. Or one could make a variation of turtle, with 
>> media-type text/genid-turtle, or something like that.    But it seems 
>> to me like a fairly elegant hack to use TurtlePatch for this, since 
>> the time in your coding you need these two things is the same.  (I 
>> note that SPARQL's INSERT DATA syntax does allow blank nodes, as a 
>> way to create fresh blank nodes.  I suggest they not be used when one 
>> is using text/turtlepatch like this, as an RDF serialization syntax.)
>>
>>     -- Sandro
>>
>> [1] https://www.w3.org/2001/sw/wiki/TurtlePatch
>>
>>
>
Received on Wednesday, 5 February 2014 19:29:59 UTC