RE: minimal canonicalization from Phillip M Hallam-Baker on 1999-10-14 (w3c-ietf-xmldsig@w3.org from October to December 1999)

From: Phillip M Hallam-Baker <pbaker@verisign.com>
Date: Thu, 14 Oct 1999 14:11:04 -0400
To: "Greg Whitehead" <gwhitehead@signio.com>, <w3c-ietf-xmldsig@w3.org>
Message-ID: <004c01bf166f$7a8b6580$6e07a8c0@pbaker-pc.verisign.com>

> We can't dictate a transport encoding, since we are expected to 
> sign content
> by reference.  In theory, HTTP should probably require the 
> rfc2049 canonical
> encoding for text/*, but it doesn't seem to.

It doesn't and shouldn't. In fact HTTP requires that the server not 
meddle with the text it is transporting.

HTTP adopted the novel idea that canonical encoding was part of the
problem and not the solution.


> Keep in mind that we're not modifying the content on disk (or on 
> its way to
> disk).  This is just part of the digest computation.

Actually this is what we should do. The signed bits should be the
bits delivered. I accept however that the bits that are sent may not
be the bits that arrive :-(


The CRLF -> LF, CR->LF convention can at least be formally described
as a FSM:-


Tokens
	CR	[Carriage Return]
	LF	[Line Feed]
	NULL	[The terminal token]
	*	[everything else]

Start
	CR	Return	[]
	LF	Start		[LF]
	*	Start		[*]
	NULL	End

Return
	CR	Return	[LF]
	LF	Start		[LF]
	*	Start		[LF *]
	NULL	End		[LF]

Where the transitions from each state are described as:
	Recieved Token, Next State, [Emitted Tokens]


The only problem with this approach is that we require the object
signed to use the LF convention when the IETF adopts the CRLF
convention. 

With a formal construction such as the above I can construct 
a formal proof that the cannonicalization has the fixed point 
property [I can show that for any given initial state and 
sequence of input tokens I can prove that f(x) and f((x))
shall arrive at the same state having emitted the same tokens]


		Phill

Received on Thursday, 14 October 1999 14:09:44 UTC