Re: Followup on I18N Last Call comments and disposition

At 00/06/28 20:56 -0400, Joseph M. Reagle Jr. wrote:
>At 11:49 6/28/00 +0900, Martin J. Duerst wrote:

>  >>Presently, the specification states:
>  >>
>  >>         We RECOMMEND that signature applications produce XML
>  >>         content in Normalized Form C [NFC] and check that any XML
>  >>         being consumed is in that form as well (if not, signatures may
>  >>         consequently fail to validate).
>  >>
> >>
>  >>
>  >>Consequently, any application that states it is conformant with the XML
>  >>Signature specification SHOULD do the above.
>  >
>  >I can imagine two different kinds of signature applications:
>  >
>  >1) Applications that create their content and sign it.
>  >2) Applications that take content created elsewhere and
>  >   sign it.
>  >
>  >For 1), the above is the right recommendation.
>  >For 2), the above is wrong.
>  >
>  >There may not be a need to make a distinction between
>  >1) and 2) in the rest of the spec, but here this is
>  >needed.
>Sorry, I still don't understand. The distinction seems immaterial,
>particularly depending on your view of the architecture where the signature
>engine is seperate from the content-content regardless. The content we are
>talking about is Signatures, not word processing files and such. Signature
>applications create XML Signatures, that's the content they concern
>themselves with.

Then that should be made clearer.

>Now, we may be speaking more to the generic point of C14N XML (if so, that's
>why I'm trying to seperate these two threads).
>  >>   document is considered valid. Consequently, while we RECOMMEND all
>  >>   documents operated upon and generated by signature applications be in
>  >>   [NFC] (otherwise intermediate processors can unintentionally break the
>  >>   signature) encoding normalizations SHOULD NOT be done as part of a
>  >>   signature transform.
>  >>
>  >
>  >I think this puts two different kinds of concerns into the
>  >same pot (but I'm not exactly sure, because I'm not really
>  >familiar with the security language).
>Well, it probably isn't even correct to call this a  "Birthday Attack," I'm
>hoping someone else jumps in and tweaks the text, but I think the gist of
>what you are after is there.
>  >>  >    It should be mandated that, when a document is transcoded from a
>  >>  >    non-Unicode encoding to Unicode as part of C14N, normalization
>must be
>  >>  >    performed (At the bottom of section 2 and also in A.4 in the June
>  >>  >    2000 draft).
>  >>  >
>  >>  >**** Unicode-based encodings are e.g. UTF-8 and UTF-16. In most cases,
>  >>  >      the transcoding result is normalized just by design of
>  >>  >      Form C, but there are some exceptions.
>  >>  >
>  >>  >**** The above also applies to other transcodings, e.g. done as part
>  >>  >      of the evaluation of an XPath transform or the minimal
>  >>canonicalization.
>  >>
>  >>Is this precluded by the security concern raised above?
>  >
>  >No, this is something different. It just helps to make
>  >transcodings as deterministic as possible. If some transcoder
>  >e.g. would convert a precomposed character in iso-8859-1 to
>  >a decomposed representation (non-normalized), checking
>  >would just not work.
>John, given that we've specified the format and syntax of a Signature, is it
>likely that a Signature will be in a non-Unicode format? Regardless, if so,
>Martin what is a "transcoding"?

Converting from one character encoding (e.g. iso-8859-1 or
shift_jis) to another (e.g. UTF-8).

>Moving from one character encoding scheme to
>another? I'm certainly getting an educating in characters! From my reading
>[1,2] I understand the following:
>           + Character Repertoire (CR) = a set of abstract characters
>           + Coded Character Set (CCS) = a mapping of code values (space,
>             points, positions) to a Character Repertoire
>           + Character Encoding Scheme (CES) = scheme for representing a
>             character repertoire in a code space. Frequently, a (|CR| >
>             |code space|) so one has to do various extensions and
>             escaping to represent those extra abstract chacters. UTF-8 is a
>           + Charset = CCS + CES
>      [1]
>      [2]

These details don't really matter too much. What matters is the
overall character encodings, usually identified by the 'charset'
parameter. These are a combination/layering of the above points.

>  >>  >In 4.3.3, the text on URI-Reference and "non-ASCII" characters
>  >>  >should be alligned with that in XPointer
>  >>  > to make sure all the
>  >>  >details are correct.
>  >>
>  >>I believe the present text is sufficient:
>  >>
>  >>         (Non-ASCII characters in a URI should be represented in
>  >>         UTF-8 [UTF-8] as one or more bytes, and then escaping these
>  >>         bytes with the URI escaping mechanism. [XML])
>  >>
>  >>
>  >>To state that all fragment identifiers (even from other MIME types) must
>  >>escaped as defined by XPTr would be innapproriate. If someone uses XPTr,
>  >>they should follow the XPtr spec of course.
>  >
>  >I didn't mean to state 'as defined by XPTr'. I was asking to take
>  >the text from XPTr. I should better have asked you to take the
>  >text from XLink (currently W3C member-only pre-published version),
>  >this may have avoided confusion.
>Can you point me to the text (even if a Member URL). You are asking me to
>align our text with something I haven't seen (and consequently don't
>understand why it is necessary.)

Please note that 'crosshatch' should be changed to 'number sign',
because this is the official ISO 10646/Unicode name.

>  >In XPointer, the conversion is described for the case of constructing
>  >an XPointer and making a legal URI fragment out of it. This may indeed
>  >be different for different fragment identifiers.
>  >
>  >You are on the other side, what you have to describe is how to take
>  >URI References appearing in DSig syntax (which may contain characters
>  >outside ASCII) and converting them to URI References formally conforming
>  >to URI syntax. In the case of XPointer (and some other cases), the fact
>  >that both procedures are identical will allow users to put XPointers
>  >(and other stuff) containing e.g. non-ASCII characters into DSig syntax
>  >as characters (without constant %HH escaping). It's two sides of the
>  >same medal, so to say.
>I'm sort of lost, but hopefully the text will help me out.

In short words:

URI (references) have been used to represent all kinds of characters,
but: 1) not in an uniform way, and 2) not in an easily readable way.
For XPointer, the question is how to represent the characters used
in XPointers (e.g. characters corresponding to element names) in
URIs. Same for new URI schemes. For XBase, XLink, and your case,
the question is how to define a conversion of something in an
attribute field in such a way as to allow that attribute to be
easily readable, rather than just formally corresponding to
the URI syntax. In both cases, the same procedure is used, in
order to make things work in a predictable way.

>  >>  >4.5 The Encoding attribute of <Object> is not described. Is that
>  >>  >something like 'charset', or something like base64, or what.
>  >>  >This needs to be clearly described or removed.
>  >>
>  >>I propose, "The Object's Encoding attributed may be used to provide a URI
>  >>that identifies the method by which the object is encoded."
>  >
>  >What would that be? Any such URI examples? Any examples
>  >where this is needed?
>The scenario is that of a binary data format, like GIF or like a Word97
>document. Something like
><Object Encoding=" ">
>  AE34SDFW34234...

Please make sure the spec says so.

Regards,   Martin.

Received on Monday, 10 July 2000 05:58:48 UTC