Re: Followup on I18N Last Call comments and disposition from Joseph M. Reagle Jr. on 2000-06-29 (w3c-ietf-xmldsig@w3.org from April to June 2000)

From: Joseph M. Reagle Jr. <reagle@w3.org>
Date: Wed, 28 Jun 2000 20:56:33 -0400
To: "Martin J. Duerst" <duerst@w3.org>
Cc: w3c-ietf-xmldsig@w3.org, "John Boyer" <jboyer@PureEdge.com>
Message-Id: <3.0.5.32.20000628205633.019f53f8@localhost>
At 11:49 6/28/00 +0900, Martin J. Duerst wrote:
 >>BTW: back in [a], what did you mean by:
 >> >- General
 >> >   The treatment of xml:lang, eg during transforms, is unclear.
 >>[a]
http://lists.w3.org/Archives/Public/w3c-ietf-xmldsig/2000JanMar/0254.html
 >What we meant is that because xml:lang is an attribute
 >that inherits, but transforms (e.g. XPath, XSLT) don't

[John's taken this thread up.]

 >>Presently, the specification states:
 >>
 >>         We RECOMMEND that signature applications produce XML
 >>         content in Normalized Form C [NFC] and check that any XML
 >>         being consumed is in that form as well (if not, signatures may
 >>         consequently fail to validate).
 >>

>>http://www.w3.org/TR/2000/WD-xmldsig-core-20000601/#sec-XML-Canonicalizatio
n
 >>
 >>Consequently, any application that states it is conformant with the XML
 >>Signature specification SHOULD do the above.
 >
 >I can imagine two different kinds of signature applications:
 >
 >1) Applications that create their content and sign it.
 >2) Applications that take content created elsewhere and
 >   sign it.
 >
 >For 1), the above is the right recommendation.
 >For 2), the above is wrong.
 >
 >There may not be a need to make a distinction between
 >1) and 2) in the rest of the spec, but here this is
 >needed.

Sorry, I still don't understand. The distinction seems immaterial,
particularly depending on your view of the architecture where the signature
engine is seperate from the content-content regardless. The content we are
talking about is Signatures, not word processing files and such. Signature
applications create XML Signatures, that's the content they concern
themselves with.

Now, we may be speaking more to the generic point of C14N XML (if so, that's
why I'm trying to seperate these two threads).

 >>   document is considered valid. Consequently, while we RECOMMEND all
 >>   documents operated upon and generated by signature applications be in
 >>   [NFC] (otherwise intermediate processors can unintentionally break the
 >>   signature) encoding normalizations SHOULD NOT be done as part of a
 >>   signature transform.
 >>   http://www.w3.org/Signature/2000/06/section-8-I18N.html
 >
 >I think this puts two different kinds of concerns into the
 >same pot (but I'm not exactly sure, because I'm not really
 >familiar with the security language).

Well, it probably isn't even correct to call this a  "Birthday Attack," I'm
hoping someone else jumps in and tweaks the text, but I think the gist of
what you are after is there.

 >>  >    It should be mandated that, when a document is transcoded from a
 >>  >    non-Unicode encoding to Unicode as part of C14N, normalization
must be
 >>  >    performed (At the bottom of section 2 and also in A.4 in the June
1st
 >>  >    2000 draft).
 >>  >
 >>  >**** Unicode-based encodings are e.g. UTF-8 and UTF-16. In most cases,
 >>  >      the transcoding result is normalized just by design of
Normalization
 >>  >      Form C, but there are some exceptions.
 >>  >
 >>  >**** The above also applies to other transcodings, e.g. done as part
 >>  >      of the evaluation of an XPath transform or the minimal
 >>canonicalization.
 >>
 >>Is this precluded by the security concern raised above?
 >
 >No, this is something different. It just helps to make
 >transcodings as deterministic as possible. If some transcoder
 >e.g. would convert a precomposed character in iso-8859-1 to
 >a decomposed representation (non-normalized), checking
 >would just not work.

John, given that we've specified the format and syntax of a Signature, is it
likely that a Signature will be in a non-Unicode format? Regardless, if so,
Martin what is a "transcoding"? Moving from one character encoding scheme to
another? I'm certainly getting an educating in characters! From my reading
[1,2] I understand the following:
          + Character Repertoire (CR) = a set of abstract characters
          + Coded Character Set (CCS) = a mapping of code values (space,
            points, positions) to a Character Repertoire
          + Character Encoding Scheme (CES) = scheme for representing a
            character repertoire in a code space. Frequently, a (|CR| >
            |code space|) so one has to do various extensions and
            escaping to represent those extra abstract chacters. UTF-8 is a
CES.
          + Charset = CCS + CES

     [1] http://www.w3.org/MarkUp/html-spec/charset-harmful
     [2] http://czyborra.com/utf/
     
 >>  >In 4.3.3, the text on URI-Reference and "non-ASCII" characters
 >>  >should be alligned with that in XPointer
 >>  >http://www.w3.org/TR/xptr#uri-escaping to make sure all the
 >>  >details are correct.
 >>
 >>I believe the present text is sufficient:
 >>
 >>         (Non-ASCII characters in a URI should be represented in
 >>         UTF-8 [UTF-8] as one or more bytes, and then escaping these
 >>         bytes with the URI escaping mechanism. [XML])
 >>         http://www.w3.org/TR/2000/WD-xmldsig-core-20000601/#sec-Reference
 >>
 >>To state that all fragment identifiers (even from other MIME types) must
be
 >>escaped as defined by XPTr would be innapproriate. If someone uses XPTr,
 >>they should follow the XPtr spec of course.
 >
 >I didn't mean to state 'as defined by XPTr'. I was asking to take
 >the text from XPTr. I should better have asked you to take the
 >text from XLink (currently W3C member-only pre-published version),
 >this may have avoided confusion.

Can you point me to the text (even if a Member URL). You are asking me to
align our text with something I haven't seen (and consequently don't
understand why it is necessary.)

 >In XPointer, the conversion is described for the case of constructing
 >an XPointer and making a legal URI fragment out of it. This may indeed
 >be different for different fragment identifiers.
 >
 >You are on the other side, what you have to describe is how to take
 >URI References appearing in DSig syntax (which may contain characters
 >outside ASCII) and converting them to URI References formally conforming
 >to URI syntax. In the case of XPointer (and some other cases), the fact
 >that both procedures are identical will allow users to put XPointers
 >(and other stuff) containing e.g. non-ASCII characters into DSig syntax
 >as characters (without constant %HH escaping). It's two sides of the
 >same medal, so to say.

I'm sort of lost, but hopefully the text will help me out.

 >>  >4.5 The Encoding attribute of <Object> is not described. Is that
 >>  >something like 'charset', or something like base64, or what.
 >>  >This needs to be clearly described or removed.
 >>
 >>I propose, "The Object's Encoding attributed may be used to provide a URI
 >>that identifies the method by which the object is encoded."
 >
 >What would that be? Any such URI examples? Any examples
 >where this is needed?
 
The scenario is that of a binary data format, like GIF or like a Word97
document. Something like

<Object Encoding="http://www.w3.org/2000/02/xmldsig#base64 ">
 AE34SDFW34234...
<Object>


_________________________________________________________
Joseph Reagle Jr.   
W3C Policy Analyst                mailto:reagle@w3.org
IETF/W3C XML-Signature Co-Chair   http://www.w3.org/People/Reagle/
Received on Wednesday, 28 June 2000 20:57:53 UTC