Re: Arguments against digest URIs from Jonas Liljegren on 2000-02-27 (www-rdf-interest@w3.org from February 2000)

From: Jonas Liljegren <jonas@paranormal.o.se>
Date: Sun, 27 Feb 2000 15:57:45 +0100
To: Sergey Melnik <melnik@db.stanford.edu>
CC: RDF Intrest Group <www-rdf-interest@w3.org>
Message-ID: <38B93B69.9A742B74@paranormal.o.se>
Sergey Melnik wrote:
> 
> Jonas Liljegren wrote:
> >
> > It's about digest URIs. There have come up a number of considerations
> > against the use of digest URIs. Not only digest URIs. But any kind of
> > algorithm for common URIs. That includes the x-pointer suggestion.

...

> > The three things that needs a calculated URI is:
> >   * model URI
> >   * statement URI
> >   * anonymous resource URI
> 
> (1) anonymous resource URI: definitely true
> (2) statement URI: I believe so
> (3) model URI: an optional goody
> 
> > I think that digest URIs is not the complete solution for the problems
> > it tries to confront. A complete solution still has to incorporate
> > more layers of metadata. It's better to just don't have digest URIs.
> 
> An incremental attempt to find the solution is better that nothing at
> all ;)



As I rethink this, I agree that digest URI could be used in some
cases. It's basicly about sharing common URIs but not having to
explicitly stating those URIs. 




I was pondering on the use of digest URIs for internal manegment with
the relational database. For that use, I could just use an sequence of
numbers. All exported data would be named i full.

I think that this takes care of 1 and 3.

Now. If you only could give a explicit URL for the statement, all
would be greate. I would like to be able to write something like this:



  <rdf:RDF>
    <rdf:Description about="http://www.w3.org/Home/Lassila">
      <s:Creator rdf:StatementID="48">Ora Lassila</s:Creator>
    </rdf:Description>
  </rdf:RDF>


And that would be a shorthand for having to (in addition to the above)
explicitly identifying the triple like this:


  <rdf:RDF
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:a="http://description.org/schema/">
    <rdf:Description ID="48">
      <rdf:subject resource="http://www.w3.org/Home/Lassila" />
      <rdf:predicate resource="http://description.org/schema/Creator" />
      <rdf:object>Ora Lassila</rdf:object>
      <rdf:type
resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#Statement" />
    </rdf:Description>
  </rdf:RDF>






>  BTW, let me reiterate what the rationales for the digest URIs are
> in first place.
>
> (1) We need to refer to anonymous resources used by other people (or the
> things that they represent).

You could argue that it is up to the service (application) to
explicitly state the URIs for all resources that other would have an
intreset in refering to.


> Currently, every RDF parser generates a
> different URI for an anonymous resource, thus the only way for a third
> party to speak about this resource would be to reify the context where
> this resource was used. Under unlucky circumstances, you'd have to cite
> the whole document (model). This is a very verbose solution and it
> indeed requires an additional layer of metadata. Seems like an overkill
> to me. IMO, standard digests solve this problem in an elegant way.

Well. You would in either case want to examine the original model. If
all you have is an digest URI; what could you do to determine the
properties of that referenced resource?

This means that you would still have to either copy the whole model or
have extra metadata pointing to a location on the web.

And that would make the digest URI redundant.


> (2) Statement URIs: a certain consensus has been reached on this list
> w.r.t. that reified statements can be adequately treated as having
> unique, context free URIs. A cryptographic digest is just a convenient
> abbreviation for these URIs.

Would that be a digest of the first or the second RDF example above?


> (3) Model URIs: having a digest for a model allows signing a set of
> statements using public key technology. Note that given an algorithm to
> compute a digest of a model, we sign the content itself rather than its
> representation using some serialization syntax. I believe model digests
> is a powerful lever for the Web of Trust.

That would be one way to do it.



> Now let me briefly address some of the issues you raised:
> 
> >  Higher threshold for implementation
> >  -----------------------------------
> >
> > There will hopefully be many implementations of RDF. Some will just be
> > able to read a specifik form of the XML serialization. Other will be
> > more generic. There is a point in not requireing too much from an
> > implementation. MD5 or SHA-1 is maby not that hard to use, (there are
> > support for both in Perl modules,) but it does limit the ways to
> > implement RDF for a specific purpose.
> 
> Note that the digest algorithm for anonymous resources depends on the
> serialization syntax, whereas statement URIs and model URIs do not. This
> means that an application that uses a very simple and straightforward
> serialization (without anonymous resources) must not know anything about
> digests. Statement URIs and model URIs are still character strings,
> aren't they?

Thats true. There could be a subset of parsers using digest URIs. And
theese would not disturbe those that doesn't use it.


> > And you can't depend on digest URIs if not everyone is using them.
> 
> True. Hard to argue against it. If there are no standards,
> interoperation is impossible.


> >  URI aliases
> >  -----------
> >
> > What about URI aliases? Two URIs could be used do denote the same
> > thing.  Persons often have a diffrent identifier for every membership
> > register. There will have to be ways to express the relationships
> > between resources, regardless of if it's about the same sort of
> > statement, the same model or the same thing.
> >
> > It's not enough to have a common algorithm to give unique identifiers
> > for anonymous resources. You will still have to be able to say that two
> > URIs is aliases for the same resource. So why not use this handling
> > of aliases to handle other cases there you want to say that one URI
> > for, say, a model is an alias for another URI.
> 
> This is correct and is definitely a requirement. Let me elaborate the
> point I made above to clarify the problem. Imagine someone stated
> something about an anonymous resource, say A, mentioned on his/her RDF
> page. No doubt, you can pick some unused name, say B, use it throughout
> you descriptions and state that B is equivalent to A. How do you refer
> to A to state the equivalence? The only way to do that would be to say
> "a resource used in this and that particular context". For example, A
> could be "a resource that has an anonymous dc:Creator X, which belongs
> to an anonymous organization Y, which is labelled 'W3C'". For anonymous
> resources, you'd have to find the complete information (context) needed
> to fully characterize it. This is exactly what the digest algorithm does
> in a transparent fashion. So you don't have to quote to whole thing.

You have a point there. :)

But I would rather see that the resource creator wouldn't use
anonymous resources about things that others would possibly like to
refere to.

And you would still like to be able to retrieve the original
resource. That would mean that you would still have to either copy the
whole context and/or have metadata pointing to the original.

But there would be a few cases there you could suffice with just
digest URIs.


> >  Value equivalence
> >  -----------------
...
> Digest URIs are not meant to provide a general solution for specifying
> or computing the equivalence of resources.

> >  Not realy unique
> >  ----------------
> >
> > A digest is not guaranteed to be unique. There are a theoretical
> > chanse that two diffrent things will get the same URI.  There would
> > still have to be an extra layer for determining URI equivalence.
> 
> Legal issues are out of scope. For most other practical purposes,
> 160-bit (or X-bit) hash seems to be a good approximation.

It still makes me feel unsatisfied. Why would you accept errors in
some cases?

In existing cases, digests are always used as a checksum. You already
know that two documents are supposed to be equivalent, but want to
make sure. You alredy know what user is trying to log in, but want to
check wiith an extra password string.  In all cases, the digest is an
complement to the unique identifier. It's not the identifier in
itself.


> >  The nature of the statement
> >  ---------------------------
> >
> > In a reification of a statement, every reification should be handled
> > separately, as separate events. They have properties like source,
> > time, probability and context of statement. Even if the statement in
> > itself would have a unique URI, there would have to be separate URIs
> > for every stating event.
> 
> I disagree with that. See also
> http://lists.w3.org/Archives/Public/www-rdf-interest/1999Dec/0070.html

I think that this post confirms what i said. Let me clarify:

Lets say that there are two persons stating that the earth is
flat. Lets say that the two statings is described in two diffrent
models. (This model is not the same as the example in the post refered
to above):

No 1:

  <rdf:RDF
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:a="http://description.org/schema/">
    <rdf:Description>
      <rdf:subject resource="http:://some.org/Earth" />
      <rdf:predicate resource="http://some.org/shape" />
      <rdf:object resource="http://some.org/Flat" />
      <rdf:type
resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#Statement" />
      <a:statedBy>Fred</a:statedBy>
      <a:statedOn>20000227T1507</a:statedOn>
    </rdf:Description>
  </rdf:RDF>

No 2:

  <rdf:RDF
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:a="http://description.org/schema/">
    <rdf:Description>
      <rdf:subject resource="http:://some.org/Earth" />
      <rdf:predicate resource="http://some.org/shape" />
      <rdf:object resource="http://some.org/Flat" />
      <rdf:type
resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#Statement" />
      <a:statedBy>Tom</a:statedBy>
      <a:statedOn>19991230T1905</a:statedOn>
    </rdf:Description>
  </rdf:RDF>


This has been discussed before. The question is if the two statements
has the same URI or not. Is it the same reefied statement?

Think about this. It WOULD get the same URI if you would breake out
the a:statedBy and a:statedOn, and created a digest URI from it:

No 3:

  <rdf:RDF
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:a="http://description.org/schema/">
    <rdf:Description>
      <rdf:subject resource="http:://some.org/Earth" />
      <rdf:predicate resource="http://some.org/shape" />
      <rdf:object resource="http://some.org/Flat" />
      <rdf:type
resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#Statement" />
    </rdf:Description>
    <rdf:Description about="calculatedDigestURI">
      <a:statedBy>Tom</a:statedBy>
      <a:statedOn>19991230T1905</a:statedOn>
    </rdf:Description>
  </rdf:RDF>


But if you have a unique URI for the reified statement, you would mix
up the properties for it. It would be like this:

No 4:

    <rdf:Description about="calculatedDigestURI">
      <a:statedBy>Fred</a:statedBy>
      <a:statedOn>20000227T1507</a:statedOn>
      <a:statedBy>Tom</a:statedBy>
      <a:statedOn>19991230T1905</a:statedOn>
    </rdf:Description>

Now: Who made what statement on what date? You can't tell anymore.


There are (as pointed out in the previuos posts on this topic) two
solutions for this:

1. Let the reified statements have individual URIs.

2. Create a statement resource pointing to the global URI representing
   the reified statement.


The reified statements would have individual URIs if you used digest
URIs on No1 and No2 above. They would also have individual URIs if a
parser would genereate a local URI for them. But the application could
choose to handle the reified statement resource as the same, making
them the same resouce. This would be a part of the handling of aliases
that the application would have to have.

The second solution would give you a model like this:

No 5:

  <rdf:RDF
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:a="http://description.org/schema/">

    <rdf:Description ID="calculatedDigestURI">
      <rdf:subject resource="http:://some.org/Earth" />
      <rdf:predicate resource="http://some.org/shape" />
      <rdf:object resource="http://some.org/Flat" />
      <rdf:type
resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#Statement" />
    </rdf:Description>

    <rdf:Description about="calculatedDigestURI">
      <a:stating>
         <rdf:Description>
           <a:statedBy>Fred</a:statedBy>
           <a:statedOn>20000227T1507</a:statedOn>
         </rdf:Description>
      </a:stating>      
      <a:stating>
         <rdf:Description>
           <a:statedBy>Tom</a:statedBy>
           <a:statedOn>19991230T1905</a:statedOn>
         </rdf:Description>
      </a:stating>      
    </rdf:Description>

  </rdf:RDF>



I would like this (no5) way to handle reified statement. The a:stating
would represent the stating event. This would let the application use
a globaly uniwue URI for the reified statement.

It would be easier to implement internaly, if you wouldn't have to
handle multiple URIs for the same statment. But it would rewuire some
extra work from the imporing of the reified statements, as those would
have to be recogized and mapped to the internal URIs. If they already
have an URI, they would be given an alias to the internal URI.

Digest URIs could solve some of this if a lot of things could be
agreed upon. But you would still have to handle aliases some day...



> >  Version handling
> >  ----------------
> >
> > Statements, resources, literals and models will come in diffrent
> > versions.  Some versions will be chronological. Other will be
> > variations of the content, like different languages or different
> > target groups.
> >
> > There is many ways to handle new versions. Many applications would
> > like to keep a statement URI, even if the object part of it changes.
> > They would often like to keep the URI of a resource, even if its
> > content changed. They would like to keep the URI of the model, even if
> > new statements would be added.
> >
> > Some applications would like to handle a history of versions, of
> > statements in different times. Others would only concern temself with
> > the present.
> >
> > The use of digest URIs for statements and models will force every
> > application to deal with history, and to deal with it in a way that
> > could be incompatible with what is needed. I think that it would be
> > better to let the version handling be a separate layer, that could be
> > included or excluded, and that could evolve by itself to meet the
> > needs.
> 
> Digest-based model URIs provide a way to refer to the RDF content
> directly, rather than to the location of its serialization. No force,
> please! ;) One can still refer to an RDF document using a URL. But its
> contents may have changed... A version handling mechanism can be easily
> built on top of digest-based URIs and URL of models.

Digest-based model URIs would be apropriate in some cases. But I think
that tere is more cases there you would like to refere directly to the
document URL. In a simple implementation, you would have to choose
between a simple location URL or the digest URI.



> >  Open / closed models
> >  --------------------
> >
> > How will you maintain metadata about a model, with digest URIs?  The
> > metadata would have to be linked to the model. But every change in the
> > model would modify the model URI.
> 
> This is exactly the intention. There is a different between saying "I
> trust the fact that Ralph Swick works at W3C" and saying "I trust
> whatever information in contained in this page". The latter may be
> appropriate for many cases, though.

Yep.


> >  Statements as models
> >  --------------------
> >
> > A model is a group of statements. we could reify a single statement,
> > but you would maby more often like to say somethng about a group of
> > statements.  This group could be given a explicit URI.  That would be
> > the same thing as to give a explicit URI to a model. The grouping of
> > the statement could be done on one site and used on other sites.
> >
> > The handling of those things is something that belongs on a higher
> > level. It's not something to be handled with digest URIs.
> 
> Why not? This is exactly how it works: you drop a bunch of statements
> into an empty model and compute its digest-based URI. That gives an
> explicit URI for a group of statements. I don't see any contradiction in
> that.

Ehum.. You are sort of right there...


> An application that does not care about the uniqueness etc. of generated
> URIs should not bother. On the other hand, the implementation of this
> mechanism enables other developers to evaluate its usefulness in
> different application scenarios.

Right.

I was thinking about it's usefulness for my DB-based perl modules
internal working...

-- 
/ Jonas  -  http://paranormal.o.se/perl/proj/rdf/schema_editor/
Received on Sunday, 27 February 2000 09:55:23 UTC