Re: Problem with auto-generated fragment IDs for graph names from Andy Seaborne on 2013-02-20 (public-rdf-wg@w3.org from February 2013)

From: Andy Seaborne <andy.seaborne@epimorphics.com>
Date: Wed, 20 Feb 2013 10:15:37 +0000
To: public-rdf-wg@w3.org
Message-ID: <5124A249.7080902@epimorphics.com>
On 19/02/13 22:03, Manu Sporny wrote:
> On 02/19/2013 10:10 AM, Andy Seaborne wrote:
>>> I read a dataset somewhere on the Web and it has IRIs in it,  how
>>> would I know that they had been "defined" in this way so that I
>>> knew they were intended to denote a graph?
>>
>> Either know the provenance of the data or look at the URI pattern and
>> know it's JSON-LD generated.
>
> Andy, all the variations of this idea that we've covered on this list
> either don't work, or lead to strange outcomes in RDF. I tried to
> explain this here:
>
> http://lists.w3.org/Archives/Public/public-rdf-wg/2013Feb/0073.html

I didn't get a reply to my comments on that.

Your key argument is the cost on developers.

I have outlined a scheme that makes *zero* changes to your examples.

You example is:

[{
   "@graph": {
     "source": "http://mybank.com/accounts/manu",
     "destination": "http://yourbank.com/accounts/richard",
     "amount": "5.00",
     "currency": "USD"
   }
},{
   "@graph": {
     "source": "http://mybank.com/accounts/manu",
     "destination": "http://yourbank.com/accounts/kingsley",
     "amount": "5.00",
     "currency": "USD"
   }
}]

which has no bNode labels.

[[
> So, JSON-LD developers can happily use
> the first bit of markup and can remain completely unaware that graph
> name identifiers are automatically created for them when they normalize
> to the NQuad serialization format:
]]

Yep.

I proposed the *parser* generate fragments or URIs and in fact label 
generation is what happens in your example ... _:c14n1, _:c14n2 which 
came from somewhere.  They are generated.  Use <#g1>, <#g2>.

> Fragment IDs are not a good solution for us because they don't work in
> instances where there is no base IRI for the document (which is our
> primary use case in the Web Payments work).

Create one ... it's purely internal as you said.  The choice does not 
matter.  (Aside from the ukkiness, because you use bNode identifiers - 
use the SAME one for all baseless documents!)  Your use of bNode 
identifiers is equivalent to a base URI of "_:" because as Markus said, 
it is about navigation of the local document.

>
> Skolemization doesn't work because the IDs must be generated in a
> decentralized manner when normalizing, there is no base IRI, and if the
> IDs picked between two implementations for the same data differ, the
> digital signatures won't match.

What is the scope of the normalization?  The bnodes identifiers are 
local - the fragments are local.

> Minting a new IRI scheme has the downside that nobody on this mailing
> list, rightfully, thinks that we need a new IRI scheme just for naming
> graphs. This is especially true now that nobody on this list has been
> able to claim that there is any technical problem with using blank nodes
> as graph names. New IRI schemes would also have to be interpreted in a
> special way so that RDF statements aren't merged accidentally (what
> happens when there are two <graph:1> IRIs in the same quad-store?).
>
> UUIDs do not work for us because of the possibility of collision, even
> if it is a very remote possibility. There is a solution on the table
> that doesn't have any global collision possibility (bnode ids), we'd
> rather use that.

You need uniqueness, not unguessibility (the bnode labels are 
guessable).  There are 122 bits of randomness in a V4 UUID.  Have you 
checked your error detecting RAM recently?

(You can use a seed of any length anyway as you do not need to comply 
with any standard)

There are UUID schemes that do not clash (V1).

> Allowing blank nodes to serve as a name for a graph seems to be the best
> solution, as we seem to have exhausted all of the other possibilities.
>
> -- manu
>
Received on Wednesday, 20 February 2013 10:16:13 UTC