RE: minimal canonicalization from Ed Simon on 1999-10-08 (w3c-ietf-xmldsig@w3.org from October to December 1999)

From: Ed Simon <ed.simon@entrust.com>
Date: Fri, 8 Oct 1999 16:06:57 -0400
To: "'w3c-ietf-xmldsig@w3.org'" <w3c-ietf-xmldsig@w3.org>
Message-ID: <01E1D01C12D7D211AFC70090273D20B101C4A8B7@sothmxs06.entrust.com>
Hi Mark (et al),

Mark, it looks great.  Since I only had a few
minor comments, I've gone ahead and copied the
archive as you suggested.

Just to try and ensure that everyone has a
common understanding of what this is about...
It looks like there is an early concensus to
removing the canonicalization element from
<SignatureInfo> and simply requiring a DOM
canonicalization instead.  However, we still
need a minimal canonicalization algorithm
for in <Transformations> so the following
text is applicable to that.

(I wonder if we will want the option of
minimal canonicalization for imbedded
<Object> elements as well?)

Ed
-----Original Message-----
From: Mark Bartel [mailto:mbartel@thistle.ca]
Sent: October 8, 1999 3:36 PM
To: 'Ed Simon '
Subject: RE: minimal canonicalization


Ed,

There does not seem to be any disagreement to removing the attribute.  So,
we should post a suggested section minimal canonicalization section to the
list.  May I propose

--start section--

7.5.2 Minimal Canonicalization

The algorithm identifier for the minimal canonicalization is
http://www.w3.org/1999/10/signature-core/minimal. An example of a minimal
canonicalization CanonicalizationAlg element is

<CanonicalizationAlg
Algorithm="http://www.w3.org/1999/10/signature-core/minimal"/>
***[Ed: should we use the following instead
***<CanonicalizationAlgorithm
name="http://www.w3.org/1999/10/signature-core/minimal"/>
***]


The minimal canonicalization algorithm: 

* converts the character encoding to UTF-8, removing the encoding
pseudo-attribute
* normalizes line endings, changing CR/LF sequences to LF only
***[Ed:  We also need to change each CR not followed by an LF.
***  eg. change "Blah\rBlah." to "Blah\nBlah"
***]


NOTE If possible, characters that are decomposed in the source character set
remain decomposed when converted to UTF-8; similarly with composed
characters.  No Unicode composition/decomposition normalization is done.

; should we instead require Unicode Canonical Composition (Normalization
Form C) as specified for W3C Text Normalization (see
http://www.w3.org/TR/WD-charmod#TextNormalization)? -MB
***[Ed: I admit I'm somewhat out of my depth here but from my reading of
***"http://www.unicode.org/unicode/reports/tr15/tr15-10.html", I'd
***say keep the NOTE as it is and see if anyone disagrees.
***]

This algorithm is only applicable to XML resources.
***[Ed:  Well it would be OK for any text resource.
***For example, it might be useful for XPointers. 
***]

--end section--

If you agree, let me know and I'll post it (or you could just post it if you
like).

-Mark

-----Original Message-----
From: Ed Simon
To: 'Mark Bartel'
Sent: 10/7/99 3:00 PM
Subject: RE: minimal canonicalization

Hi Mark,

I agree with the scenario and that minimal canonicalization
should convert the data to UTF-8.  I think the only
question is whether the "encoding" pseudo-attribute should
be changed to "UTF-8".  In my original append, I was in
favour of changing it but Don made a reasonable argument
against that and I now lean somewhat to his viewpoint.

On the issue of specifying the encoding,
I can see reasons for and against and that's why,
in my original append, I asked for debate about this.
It is good you are bringing up the question.

However, I am quite opinionated regarding whitespace
canonicalization.  Normalizing character encoding and
line breaks is reasonable text-level normalization.
If one is talking about whitespace normalization, then
I have to assume that we're getting more into XML-aware
canonicalization as opposed to just text-aware
canonicalization.  (Correct me if my assumption is
wrong.)  Now, I am very much for requiring,
or strongly recommending, XML-aware canonicalization
for XML objects but it seems to me that if one wants
to do XML-aware canonicalization, you have to go the
whole way which means normalization of character encoding,
newlines, whitespace, namespaces, attribute ordering,
attribute specification (explicit with defaults), and
other things like those discussed in DOMHASH and Canonical
XML.  (Note:  I oppose the information loss in Canonical
XML.)  To me, requiring whitespace normalization for minimal
canonicalization is unnecessary for text-level
canonicalization and insufficient (by itself) for
XML-level canonicalization.

(I recognize that changing the encoding pseudo-attribute
implies some level of XML awareness but I thought it
might be a reasonable exception.)

Anyway,
I've just re-posted Don's commentary of my orignal
append.  Perhaps you could take a look at that and
then send your opinion to the archive.  (You might
include the text of this note as reference.)

Perhaps, you could indicate exactly what is meant
by "whitespace normalization".

Regards, Ed
-----Original Message-----
From: Mark Bartel [mailto:mbartel@thistle.ca]
Sent: October 7, 1999 1:30 PM
To: 'Ed Simon '
Subject: minimal canonicalization


Ed, here's the kind of scenario I was thinking of:

You create a detached signature over a web page which is in some
non-UTF8
character set.  You then decide to change the character set to, say,
UTF8.
I think the signature should still verify if you have chosen minimal
canonicalization, because part of the decision to use minimal
canonicalization is that the character set encoding doesn't matter.

Note that sometimes the character set of web pages gets converted
enroute
(because the web server is attempting to be smart).

I got the impression that you and I didn't actually disagree.  Should I
post
something to the list describing this scenario asking for further
comment?
Is there a counter-scenario?

My proposal for minimal canonicalization section:
--start section--
The minimal canonicalization algorithm:
* converts the character encoding to UTF-8, removing the encoding
pseudo-attribute
* normalizes line endings

[something about ambiguous code points here, need to look into this...
haven't looked at your code].

Line ending normalization will change any CR/LF (0x0d/0x0a) line endings
to
LF (0x0a).

This algorithms in only applicable to text resources.
--end section--

-Mark Bartel
JetForm
Received on Friday, 8 October 1999 16:07:10 UTC