c14n and normalize-space(), was RE: Insignificant whitespace from John Boyer on 2000-08-29 (w3c-ietf-xmldsig@w3.org from July to September 2000)

From: John Boyer <jboyer@PureEdge.com>
Date: Tue, 29 Aug 2000 15:23:05 -0700
To: "Petteri Stenius" <Petteri.Stenius@done360.com>, "IETF/W3C XML-DSig WG \(E-mail\)" <w3c-ietf-xmldsig@w3.org>
Message-ID: <BFEDKCINEPLBDLODCODKMELICEAA.jboyer@PureEdge.com>
Hi all,

The Xpath function normalize-space() seems to provide a fair bit of the
functionality needed to deal with this problem.

If desired, space normalization could be another boolean flag for C14N (like
comment vs. no comment).  We would probably want both as parameters
(currently, 'with comments' is triggered by a separate URI, but we'd want to
account for all combinations without an explosion of URIs).

The change would be that, when rendering a text node, render

		normalize-space(string(textnode))

if the normalize space flag is true. It's a little trickier though because
we'd have to keep track of whether we are rendering the content of an
element for which xml:space="preserve" exists and is in the output node-set.

Question:  Do we want C14N to specify this additional functionality, and if
so should it be RECOMMENDED or REQUIRED to implement?

Note that the following simple XPath transform can do most of the work (use
currently posted new processing model):

<XPath>not(self::text()[normalize-space()=""])</XPath>

This expression eliminates all text nodes that contain only whitespace.
This gets rid of whitespace between one start tag and the next, one end tag
and the next and between start and end tags-- unless intervening character
data appears in the text.

Although the function normalizes spaces within such character data, there is
no way to change the string as it appears in the node-set/parse tree.  The
only way to turf that
kind of whitespace is to modify C14N.

Any votes in favour?  Chairs?

John Boyer
Development Team Leader,
Distributed Processing and XML
PureEdge Solutions Inc.
Creating Binding E-Commerce
v: 250-479-8334, ext. 143  f: 250-479-3772
1-888-517-2675   http://www.PureEdge.com <http://www.pureedge.com/>



-----Original Message-----
From: w3c-ietf-xmldsig-request@w3.org
[mailto:w3c-ietf-xmldsig-request@w3.org]On Behalf Of John Boyer
Sent: Tuesday, August 29, 2000 12:52 PM
To: Petteri Stenius; IETF/W3C XML-DSig WG (E-mail)
Subject: RE: Insignificant whitespace


Hi Petteri,

The upcoming C14N draft is quite a bit clearer about retention of all
whitespace in character content (except \r characters that disappear due to
line delimiter normalization).  It is both specified in the prose and shown
by example.  In other words, xml:space is an attribute needed by other
applications.

However, I like this as a transform, though I don't know whether the chairs
will approve of adding it at this time.  Chairs?

Seems to me that if you have a DTD, then you can decide what constitutes
insignificant whitespace, which is whitespace found in content models other
than mixed those containing #PCData.  This could get tricky in the second
case, since it's not the whole content model containing a #PCDATA, but just
the parts where #PCDATA is allowed.

However, we cannot count on having the DTD nor on being able to read it at
that level if it were.  Instead, it might make sense to define the transform
in terms of the XPath data model.  Any text node containing only whitespace
could be consider insignificant.

This rule could be constrained by xml:space="preserve" provided the
attribute is in the node-set and contained in an element ancestor of a
'whitespace' text node.

How does this sound to the WG?


John Boyer
Development Team Leader,
Distributed Processing and XML
PureEdge Solutions Inc.
Creating Binding E-Commerce
v: 250-479-8334, ext. 143  f: 250-479-3772
1-888-517-2675   http://www.PureEdge.com <http://www.pureedge.com/>



-----Original Message-----
From: w3c-ietf-xmldsig-request@w3.org
[mailto:w3c-ietf-xmldsig-request@w3.org]On Behalf Of Petteri Stenius
Sent: Monday, August 28, 2000 10:50 PM
To: IETF/W3C XML-DSig WG (E-mail)
Subject: Insignificant whitespace



Hello,

The issue with whitespace in XML element content has been very briefly
discussed during interop testing. I would like raise this issue on the list,
as I believe many people expect insignificant whitespace to be cleaned up by
the C14N algorithms.

The current specification provides no way for an application to skip or
cleanup insignificant whitespace from a signed XML document before
digesting. However, 'insignificant' comment elements are skipped by the
default C14N algorithm!

I think we could make use of a transformation algorithm that cleans up
insignificant whitespace, the algorithm should obviously detect xml:space
attributes.

Petteri

--
Petteri Stenius                          Petteri.Stenius@done360.com
Done Information, Ltd.                         Office +358-9-5259240
                                                 Fax +358-9-52592411
http://www.doneinformation.com/               Mobile +358-50-5506161
Received on Tuesday, 29 August 2000 18:23:06 UTC