- From: Peter Ansell <ansell.peter@gmail.com>
- Date: Thu, 17 Jan 2013 09:44:04 +1000
- To: Jim McCusker <mccusj@rpi.edu>
- Cc: David Booth <david@dbooth.org>, Peter.Hendler@kp.org, "Mead, Charlie (NIH/NCI) [C]" <meadch@mail.nih.gov>, Conor Dowling <conor-dowling@caregraf.com>, Dietrich Rebholz-Schuhmann <d.rebholz.schuhmann@gmail.com>, Joanne Luciano <jluciano@gmail.com>, Michel Dumontier <michel.dumontier@gmail.com>, w3c semweb HCLS <public-semweb-lifesci@w3.org>, Renato Iannella <ri@semanticidentity.com>, Rafael Richards <rmrich5@gmail.com>, Tom Morris <tfmorris@gmail.com>
On 17 January 2013 08:27, Jim McCusker <mccusj@rpi.edu> wrote: > http://www.hpl.hp.com/techreports/2003/HPL-2003-235R1.html That algorithm doesn't seem very clean, as it relies on all of the entities not changing the blank node identifiers for the simple version. The complex version relies on all parties modifying blank node identifiers to add extra statements to the graph to track the original blank node identifiers before they modified them, which you may not be able to rely on in general across an RDF pipeline. How can you be sure that there was never a real, pre-existing, RDF triple in the original graph with the _:blanknode hasLabel "xyz"? In order to use the algorithm it seems like you must substitute all of the "_:blanknode" references with "_:xyz" and discard all triples with hasLabel as the predicate before computing the digest. In controlled situations the "hasLabel" trick works well, but it isn't a general solution by any means. Having the blank node mapping triples independent of the original set at all times would be a better solution, but it wouldn't be compatible with typical RDF processing workflows that may still assume that all triples can be merged into a single RDF Graph. In addition, the main reason that people use blank nodes is to avoid having to create identifiers, or everyone would just use URIs. The main premise that all digestable statements will have unique serialisations assigned to them by the original RDF serialiser, and custom handling by any subsequent parsers and serialisers, would require tight control on the RDF serialisers and RDF parsers in use across a system. > But there's a faster way to compute bnode identities that was presented at > ISWC this year, I still need to incorporate it: > > http://iswc2012.semanticweb.org/sites/default/files/paper_16.pdf That paper describes a mapping algorithm between two full sets of rdf statements. If you are not sure that what you have is the unmodified original set of RDF statements, and you only have the digest as a known, how can you utilise this algorithm to help with regenerating the digest to verify it? Cheers, Peter
Received on Wednesday, 16 January 2013 23:44:37 UTC