- From: Dan Brickley <danbri@w3.org>
- Date: Thu, 6 Apr 2000 06:53:12 -0400 (EDT)
- To: "McBride, Brian" <bwm@hplb.hpl.hp.com>
- cc: "'www-rdf-interest@w3.org'" <www-rdf-interest@w3.org>
Hi Brian, On Thu, 6 Apr 2000, McBride, Brian wrote: > I've been thinking of implementing digest URI's but I can't get a good > enough understanding from the mailing list archives to do so, so can someone > help me out please: Digest URIs are a proposal floating around on the mailing list; currently they have no status in the W3C RDF specifications. They may nevertheless be a useful technique for implementors; quite how they fit into the RDF picture is still up for discussion. Sergey has some notes at http://WWW-DB.Stanford.EDU/~melnik/rdf/api.html (although he uses a fictional URN scheme for urn:rdf:* identifiers which I'm not persuaded by). > 1) Which entitities can have digest URI's This would be determined by a combination of the top-level URI scheme used (uuid:, foobar:, http:, doi:, handle: etc) and the policies operating over the subset of that URI space (eg urn:rdf: etc if we had URNs) for naming entities. I think the main proposal was to use computed URIs for 'RDF models' based on the abstract contents of the graph. I suspect further work is needed here on canonicalising the graph representation (eg. treatement of language tagged content). There was also some discussion of computed URIs for 'anonymous' or so-called 'no-name' resources, ie. nodes that are 'mentioned in passing' in a chunk of XML/RDF without their URI being included in the markup. A related approach would be to use these digests as properties of resources instead of identifiers... > > 2) What are they for? does the above help? Briefly, RDF applications benefit from a datamodel that allows for aggregation of data from multiple sources. Since we (try to) use URIs for node identifiers, RDF allows us to aggregate data simply by joining on uniquely identified nodes. So, the idea behind using digests is that we can do more data joins, and therefore do better data aggregation. > > 3) What does a digest URI denote? I'm not aware of a specific URI scheme proposal for these so can't comment on this one > > 4) What properties do they need to have? > > 5) I understand there is an algorithm for computing them given an RDF syntax > representation of a model. There are algorithms for doing this sort of thing given any blob of XML markup. I took Sergey's proposal to be operating over the abstract RDF data model: Currently, model digest is a XOR of triple digests. Triple digest is computed as XOR with rotation on digests of the predicate, subject and object. This approach provides a straightforward way of digital signing of RDF content (as opposed to signing of serialized RDF), facilitating the "Web of Trust"... Given a model stored in a database, I could > serialise that many different ways. How do I compute digest URI's for a > model stored in a database, or is that an unnecessary thing to do. 'triple digest' seems to be sergey's approach. I'm not sure what we'd do about implied arcs in the graph, langauge tagging etc to ensure we had a canonicalisation strategy before computing the triple and model digests. In other words, two models could be RDF model equivalent but have trivial differences in their actual storage (missing but implied rdf:type arcs, variations in representation of XML literals, xml:lang etc) giving them different triple/model digests. Does this help? Sergey, was this a fair characterisation? There's a paper by Clifford Lynch in D-Lib Magazine Sept 1999 that touches on similar areas. In particular, the issue of different layers of representation -- we'd need to give some careful thought to canonicalisation of literal XML data for example... http://www.dlib.org/dlib/september99/09lynch.html Canonicalization: A Fundamental Tool to Facilitate Preservation and Management of Digital Information brief excerpts... [...] For example, UNICODE, which is the underlying character set for a growing number of current storage standards, allows multiple bit streams to represent the same stream of logical characters. Any canonicalization of objects that include data represented in UNICODE must enforce a standard choice of encoding on that data. [...] Canonicalizations for other types of digital objects that have less clear formal models would seem to be a likely near term research area. For example, is it reasonable to think about an RTF-based or ASCII-based canonicalization for certain types of documents, or even about a hierarchy of such canonicalizations, with algorithms higher up in the hierarchy capturing more of the document?s intrinsic meaning? This is likely to be difficult[...] Dan > [My assumption here is that the model was constructed in the database, and > was not derived from a serialised RDF input stream] > > Brian McBride > HPLabs > >
Received on Thursday, 6 April 2000 06:54:14 UTC