Re: HL7 RIM Designtime OWL Runtime RDF from Peter Ansell on 2013-01-16 (public-semweb-lifesci@w3.org from January 2013)

From: Peter Ansell <ansell.peter@gmail.com>
Date: Thu, 17 Jan 2013 09:44:05 +1000
To: Jim McCusker <mccusj@rpi.edu>
Cc: David Booth <david@dbooth.org>, Peter.Hendler@kp.org, "Mead, Charlie (NIH/NCI) [C]" <meadch@mail.nih.gov>, Conor Dowling <conor-dowling@caregraf.com>, Dietrich Rebholz-Schuhmann <d.rebholz.schuhmann@gmail.com>, Joanne Luciano <jluciano@gmail.com>, Michel Dumontier <michel.dumontier@gmail.com>, w3c semweb HCLS <public-semweb-lifesci@w3.org>, Renato Iannella <ri@semanticidentity.com>, Rafael Richards <rmrich5@gmail.com>, Tom Morris <tfmorris@gmail.com>
Message-ID: <CAGYFOCQJKJ2bdH0w9mEvOu9_c6BQe6FmfkZ957nYp75A_6Yj9Q@mail.gmail.com>

On 17 January 2013 08:27, Jim McCusker <mccusj@rpi.edu> wrote:
> http://www.hpl.hp.com/techreports/2003/HPL-2003-235R1.html

That algorithm doesn't seem very clean, as it relies on all of the
entities not changing the blank node identifiers for the simple
version. The complex version relies on all parties modifying blank
node identifiers to add extra statements to the graph to track the
original blank node identifiers before they modified them, which you
may not be able to rely on in general across an RDF pipeline. How can
you be sure that there was never a real, pre-existing, RDF triple in
the original graph with the _:blanknode hasLabel "xyz"? In order to
use the algorithm it seems like you must substitute all of the
"_:blanknode" references with "_:xyz" and discard all triples with
hasLabel as the predicate before computing the digest. In controlled
situations the "hasLabel" trick works well, but it isn't a general
solution by any means. Having the blank node mapping triples
independent of the original set at all times would be a better
solution, but it wouldn't be compatible with typical RDF processing
workflows that may still assume that all triples can be merged into a
single RDF Graph.

In addition, the main reason that people use blank nodes is to avoid
having to create identifiers, or everyone would just use URIs. The
main premise that all digestable statements will have unique
serialisations assigned to them by the original RDF serialiser, and
custom handling by any subsequent parsers and serialisers, would
require tight control on the RDF serialisers and RDF parsers in use
across a system.

> But there's a faster way to compute bnode identities that was presented at
> ISWC this year, I still need to incorporate it:
>
> http://iswc2012.semanticweb.org/sites/default/files/paper_16.pdf

That paper describes a mapping algorithm between two full sets of rdf
statements. If you are not sure that what you have is the unmodified
original set of RDF statements, and you only have the digest as a
known, how can you utilise this algorithm to help with regenerating
the digest to verify it?

Cheers,

Peter

Received on Wednesday, 16 January 2013 23:44:42 UTC