Re: line feed normalization in C14N from Joseph Reagle on 2003-05-14 (w3c-ietf-xmldsig@w3.org from April to June 2003)

From: Joseph Reagle <reagle@w3.org>
Date: Wed, 14 May 2003 10:34:00 -0400
To: Aleksey Sanin <aleksey@aleksey.com>, John Boyer <jboyer@PureEdge.com>
Cc: w3c-ietf-xmldsig@w3.org
Message-Id: <200305141034.00019.reagle@w3.org>

On Saturday 10 May 2003 20:07, Aleksey Sanin wrote:
> During discussion in xmlsec mailing list we came up with two
> possibilities: 1) All '\r' characters from the document should be removed
> when document is parsed
>     by XML processor.
>     2) All '\r' should be converted to "&#D;" by the parser.

My recollection has long gone stale on such nuances -- I hope John might 
remember better -- but I'd opt for the "conversion" option. Not only does 
the following say:

  To simplify the tasks of applications, the characters passed to an
  application by the XML processor must be as if the XML processor
  normalized all line breaks in external parsed entities (including the
  document entity) on input, before parsing, by translating both the
  two-character sequence #xD #xA and any #xD that is not followed
  by #xA to a single #xA character.
  http://www.w3.org/TR/REC-xml#sec-line-ends

but XPath says:

  The normalize-space function returns the argument string with
   whitespace normalized by stripping leading and trailing whitespace
   and replacing sequences of whitespace characters by a single space.
  http://www.w3.org/TR/xpath#function-normalize-space

I read the text that you cite from C14N:

  -  All whitespace in character content is retained (excluding 
  characters removed during line feed normalization)  
  http://www.w3.org/TR/2001/REC-xml-c14n-20010315#Terminology

as not requiring the special (normative) removal of '\r', but to only 
(non-normatively) refer to the removal of '#xA' when ' #xD #xA' was 
replaced with '#xD'.

Received on Wednesday, 14 May 2003 10:34:41 UTC