Re: Digest URI's from Sergey Melnik on 2000-04-06 (www-rdf-interest@w3.org from April 2000)

From: Sergey Melnik <melnik@db.stanford.edu>
Date: Thu, 06 Apr 2000 11:20:47 -0700
To: "McBride, Brian" <bwm@hplb.hpl.hp.com>
CC: Dan Brickley <danbri@w3.org>, "'www-rdf-interest@w3.org'" <www-rdf-interest@w3.org>
Message-ID: <38ECD57F.67CE3D11@db.stanford.edu>

Dan, thanks for taking care of the questions. A good summary of the
digest algorithm was presented by Peter and Reinhold at
http://nestroy.wi-inf.uni-essen.de/rdf/sum_rdf_api/. Some add-ons to
Dan's explanation:

> > 1)  Which entitities can have digest URI's

On the model level, digests are useful for generating

- "canonical" URIs for statements themselves
- URIs for sets of statements (models)

On the syntactic level, digests can be used to give explicit URIs to
"unnamed" resources used in some serialization syntax like the current
official RDF/XML syntax or a strawman syntax.

> > 2)  What are they for?

- URIs for statements increase interoperability
- the content (as opposed to the serialized representation) of the
models can be signed
- unnamed resources can be referred to; their URIs depend on the context
and are resistant to some changes of the serialization

> > 3)  What does a digest URI denote?

Well, whatever it stands for: a statement, a model or unnamed resource.

> > 4)  What properties do they need to have?

Most important, uniqueness. More research is needed here. The algorithms
that I proposed are easy to implement and fast, but I'm not sure about
their quality. A simple XOR might not be sufficient. For models URIs,
the obvious requirement is that the digest is independent of the order
in which statements are fed to the algorithm. Unnamed resource URIs
should depend on the context in which these resources were used;
"context" needs to be clearly defined.

> > 5) I understand there is an algorithm for computing them given an RDF syntax
> > representation of a model.

The above mentioned algorithm for the statement and model URIs operates
on the model, the one for the unnamed resources depends on the
serialization.

> 'triple digest' seems to be sergey's approach. I'm not sure what we'd do
> about implied arcs in the graph, langauge tagging etc to ensure we had a
> canonicalisation strategy before computing the triple and model
> digests. In other words, two models could be RDF model equivalent but have
> trivial differences in their actual storage (missing but implied rdf:type
> arcs, variations in representation of XML literals, xml:lang etc) giving
> them different triple/model digests.
> 
> Does this help? Sergey, was this a fair characterisation?

Oh yes, thanks! Model-based algorithms (for statement and model URIs) do
not depend on the serialization syntax. For syntax-based ones a
well-defined mapping from syntax to model is needed. This mapping should
be described by the spec. The current RDFMS spec does it, but it has
some deficiencies in not properly addressing language tagging etc.

> > [My assumption here is that the model was constructed in the database, and
> > was not derived from a serialised RDF input stream]

The model-based algs can be applied directly to the DB data.

Best,
Sergey

Received on Thursday, 6 April 2000 14:11:11 UTC