Re: Meaning of document closure from Daniel LaLiberte on 1999-09-21 (w3c-ietf-xmldsig@w3.org from July to September 1999)

From: Daniel LaLiberte <liberte@w3.org>
Date: Tue, 21 Sep 1999 15:17:32 -0400 (EDT)
To: "Tim Berners-Lee" <timbl@w3.org>
Cc: "IETF/W3C XML-DSig WG" <w3c-ietf-xmldsig@w3.org>
Message-ID: <14311.55756.46889.652758@alceste.w3.org>
Tim Berners-Lee writes:
 > I am worried that this "meaning of document closure" thread is
 > suggesting that signing parts should be construed as parts having
 > meaning in context.

Here is my short version:

1. I agree that the meaning of parts of documents should not be assumed
   out of context.

2. I believe that a document part *can* be paired with a reference to
   its context.  I think that is what a "document closure" is, if I
   understand how the term is being used here.

3. But, signing documents or parts of documents should have nothing to
   do with the meaning of the documents.  We only sign the bits and
   bytes of the document, not the meaning.  The signature itself may
   have meaning, but that is separate from the meaning of the document
   being signed.  We can sign a package containing a document and a
   reference to its intended meaning, but that is different from the
   meaning of the signature itself.

And the long version....

 > Basically, logically, a document is a sentence in
 > a language, and we have applications which process documents
 > according to specs, and we say that documents have meaning.
 > 
 > Parts of documents do not have meaning per se.

I would say that parts of documents have meaning only in context.  But
even whole documents can have a different meaning depending on how they
are referenced, and how they relate to other documents.  The difference
is only whether the data is immediately contained or externally
referenced.  This one difference seems to imply that there is also
something more authoritative about immediate data, and hence the
relationship to the signing.  But just because you found some bytes next
to other bytes doesn't by itself make them more authoritative.  Rather,
it is the *signing* of bytes, whether they are found together or apart,
that gives us the authority to say that the two forms, together or
apart, are equivalent.

But in addition to having meaning, documents also are composed of bits
and bytes, and this is the level at which signatures operate.  Why does
it matter to a signature what a string of bytes means?

The difference between immediate vs referenced data is relevant for
signing, because a referenced document must be dereferenced before its
bytes can be signed, and the dereferencing introduces another risk.  It
may also be important to sign the reference itself, in the context of
the document containing the reference.

 > It is for a trust system to determine the algorithm for defining what
 > can be inferred from a document signed with a given key.

Right, but what can be inferred from a document (regardless of a
signature) is different from what can be inferred from the signature of
a document.

 > When we talk about signing parts of a document, then they only way
 > I can see of giving meaning to this is to say that we are signing a
 > some document which is not acutally given, but is formed by making
 > a particular transfortion on the document given.

Transformations may be at various levels of syntax and semantics,
depending on where you draw the line.  But mere "extraction" of the
bytes of a part of a document seems like a fairly straightforward
syntactic operation.  The only way it can get more complex is if the
bytes of the part are somehow different depending on the context.

 > One  can try to talk about the "semantics of a part of the document
 > in its context in the document" as much as one likes but one can only
 > define what it means by showing or defining that it is equivalent to some
 > other notional document.  

Showing semantic equivalence would be hard, but I don't think it is
relevant to signing bytes.

 > Life is then simplified.  A signature is over a document.

But life often refuses to be so simplified.  

If a document is merely a representation of a resource, and the resource
contains other resources, each with its own separate representations,
then we are back to considering whether a part of a document that
corresponds to a separate resource might be worthy of a signature.


On a slightly different tangent, now that I am thinking about all this,
how useful or necessary is it to sign parts of documents?  I'd always
thought of XML signatures as being contained in XML documents where the
thing being signed was part of that very same document.  Obviously, in
this case, you don't want to sign the whole document including the
signature because the signature would be partly a function of itself.
So this assumes we can sign parts of documents from the outset.  That
doesn't mean it is necessary, however, since we could always sign
something external to the XML signature document, referenced by a URI.

But sometimes we want to be able to sign not only a document referenced
by a URI, but the combination of the URI and the document, so that we
know that neither has changed.  In that case, we have to package up the
URI and either the whole document, or a reliable hash of the document,
and sign the package.  That package could be a document with its own
URI, but we would be back at step one if we relied on resolving the URI
for the package.  So it seems we MUST have signatures of anonymous
packages.  Is this true?

 > We don't have (here) to discuss what modifications may or may not be
 > made to a document later.  A particular sentence has been
 > signed. According to the language, one may be able to deduce other
 > valid things and craete other believable documents by futher
 > manipulations but this spec doesn't have to worry about that.

Belief in what a document says (its meaning) must be distinguished from
the belief only that it has been signed by some authority.  If the
meaning of a signature provided by some authority is that everything it
signs is true, and if you trust that authority, then you would probably
believe in any document it signs.  But not all signatures have the
meaning that the thing signed is true.  A notory public only asserts
that some particular individual signed a particular document on some
particular date, not that what the document says is even meaningful.



 > (By the way, I think of closure in the sense of the set of all objects
 > obtained by repeated application of an operation. 

That sounds like a transitive closure operation.  
Is it meaningful to distinguish a closure which includes references 
from a transitive closure with no (meaningful) references?

 > I expected the term to represent the repeated operation of finding
 > all dependent references within the document and signing them.

Certainly in some cases you want to sign both a document (whether a part
or a whole) together with things it is dependent on, whether or not
those things are explicitly referenced by the document or implicit in
some application context.  You also sometimes want to sign a reference
itself together with the document it resolves to (as discussed above). 

 > Dependent references meaning something which affects the meaning and
 > you won't already know and trust.

This is where it gets tricky.  A dumb, generic signature function should
not be required to figure out any of that, of course.  Some appropriate
higher-level application, together with your preferences and your web of
trust, would decide what package of things should be signed.  Given
that, the meaning of documents is irrelevant to signing them at the
level of the dumb generic signature function.

Is your real concern that we will have problems writing those
appropriate higher-level applications that attempt to understand some of
the semantics of documents to determine what package of things needs to
be signed?  I agree it will be a challenge.

-- 
Daniel LaLiberte
liberte@w3.org
Received on Tuesday, 21 September 1999 15:17:34 UTC