- From: Gavin Carothers <gavin@carothers.name>
- Date: Mon, 25 Feb 2013 18:24:10 -0800
- To: RDF-WG WG <public-rdf-wg@w3.org>
- Message-ID: <CAPqY83xbmJA5HD4OwHNWr-HobmEQnhK79irPZ4pChr1u9+HDNA@mail.gmail.com>
{ "@context": ..., "@graph": [ { "@graph": { "name": "Joe" } }, { "@graph": { "name": "Susan" } } ] } These are two graphs, and we need to create unique names for them for a normalization algorithm. Graph 1 expressed as Turtle: @prefix : <http://example.com/ns/> [] :name "Joe" . Graph 2 expressed as Turtle: @prefix : <http://example.com/ns/> [] :name "Susan" . So far so good. The normalization is in terms of N-Quads however, and therefor needs both names for the graph, and labels for the blank nodes. Lets start by putting each graph into N-Triples. Graph 1 expressed as N-Triples: _:c14n1 <http://example.com/ns/name> "Joe" . Graph 2 expressed as N-Triples: _:c14n1 <http://example.com/ns/name> "Susan" . Ah ha, we've used the same blank label for both! But that's okay at the moment since both exist as graphs in their own right. Lets take the md5sum of both: Graph 1 md5sum: 12e775c37a0e6a327ace2114bb5a1b47 Graph 2 md5sum: a44173cbf95beeee164add48a1201b24 Now, lets create that N-Quads document: _:c14n-12e775c37a0e6a327ace2114bb5a1b47-1 <http://example.com/ns/name> "Joe" <urn:hash:application/n-triples:md5:12e775c37a0e6a327ace2114bb5a1b47> . _:c14n-a44173cbf95beeee164add48a1201b24-1 <http://example.com/ns/name> "Susan" <urn:hash:application/n-triples:md5:a44173cbf95beeee164add48a1201b24> . So why exactly don't hashes work for identifying graphs that ALREADY have to be normalized? If we're very worried about collusion (we shouldn't be, there is assumed to be a better cryptographic method being used to really sign these documents in your use case) replace md5 with sha512 or Whirlpool. The argument I see is that what if blank nodes are shared between graphs in a dataset. That seems to be a "mere" matter of designing the normalization method to be stable at some point and then defining all the labels you need. Changing from labels into hash based IRIs at the last moment shouldn't make it any harder. (Which is not the same thing as saying it's easy) While processing using blank nodes INTERNALLY makes perfect sense. Many reasoners and other software uses blank nodes in places that RDF doesn't allow for in their internals. This is NOT a generic argument for how to handle unlabeled graphs in JSON-LD, which I still think are a poor idea that can't be expressed in any other graph synatx, nor used with SPARQL. Just saying that blank nodes as graph labels are NOT required for the normalization use case. Cheers, Gavin
Received on Tuesday, 26 February 2013 02:24:38 UTC