- From: Sergey Melnik <melnik@db.stanford.edu>
- Date: Thu, 06 Apr 2000 11:20:47 -0700
- To: "McBride, Brian" <bwm@hplb.hpl.hp.com>
- CC: Dan Brickley <danbri@w3.org>, "'www-rdf-interest@w3.org'" <www-rdf-interest@w3.org>
Dan, thanks for taking care of the questions. A good summary of the digest algorithm was presented by Peter and Reinhold at http://nestroy.wi-inf.uni-essen.de/rdf/sum_rdf_api/. Some add-ons to Dan's explanation: > > 1) Which entitities can have digest URI's On the model level, digests are useful for generating - "canonical" URIs for statements themselves - URIs for sets of statements (models) On the syntactic level, digests can be used to give explicit URIs to "unnamed" resources used in some serialization syntax like the current official RDF/XML syntax or a strawman syntax. > > 2) What are they for? - URIs for statements increase interoperability - the content (as opposed to the serialized representation) of the models can be signed - unnamed resources can be referred to; their URIs depend on the context and are resistant to some changes of the serialization > > 3) What does a digest URI denote? Well, whatever it stands for: a statement, a model or unnamed resource. > > 4) What properties do they need to have? Most important, uniqueness. More research is needed here. The algorithms that I proposed are easy to implement and fast, but I'm not sure about their quality. A simple XOR might not be sufficient. For models URIs, the obvious requirement is that the digest is independent of the order in which statements are fed to the algorithm. Unnamed resource URIs should depend on the context in which these resources were used; "context" needs to be clearly defined. > > 5) I understand there is an algorithm for computing them given an RDF syntax > > representation of a model. The above mentioned algorithm for the statement and model URIs operates on the model, the one for the unnamed resources depends on the serialization. > 'triple digest' seems to be sergey's approach. I'm not sure what we'd do > about implied arcs in the graph, langauge tagging etc to ensure we had a > canonicalisation strategy before computing the triple and model > digests. In other words, two models could be RDF model equivalent but have > trivial differences in their actual storage (missing but implied rdf:type > arcs, variations in representation of XML literals, xml:lang etc) giving > them different triple/model digests. > > Does this help? Sergey, was this a fair characterisation? Oh yes, thanks! Model-based algorithms (for statement and model URIs) do not depend on the serialization syntax. For syntax-based ones a well-defined mapping from syntax to model is needed. This mapping should be described by the spec. The current RDFMS spec does it, but it has some deficiencies in not properly addressing language tagging etc. > > [My assumption here is that the model was constructed in the database, and > > was not derived from a serialised RDF input stream] The model-based algs can be applied directly to the DB data. Best, Sergey
Received on Thursday, 6 April 2000 14:11:11 UTC