- From: Mark Bartel <mbartel@thistle.ca>
- Date: Fri, 15 Oct 1999 17:28:33 -0400
- To: "'John Boyer '" <jboyer@uwi.com>, "'IETF/W3C XML-DSig WG '" <w3c-ietf-xmldsig@w3.org>
I agree with John here. I must admit that the whole argument about types & transformations confused me until I realized that the idea was to redefine what a transformation was. So, in the interest of clarifying the problem, I'll explain my thinking. As I understand it, the original idea was that the transformations were a pipeline taking in an octet stream at one end and producing an output octet stream. Very simple and straightforward; one input and one output. The new idea is to augment the octet stream, by adding a parallel type. In other words, each algorithm would have two inputs and two outputs, one of which is an octet stream and the other a "type". The idea comes from the http world where one retrieves a document (analogous to our octet stream) and also gets a whole whack of other information along with it in the http headers, including the mime type of the document. Browsers also sometimes utilize the file extension to determine type. The motivation for the new concept is that some transformations require knowledge of the input type. The main example is the fragment id, which is defined per type. Another example is character set encodings. I strongly feel that transformations should have one input and one output. This makes the specification and implementation for each transformation much simpler and therefore simplifies interoperability and testing. The document being transformed is not going to switch from being text/html to image/jpeg betwixt the signing and the verification, and if it does the signature wouldn't verify anyway. The html or jpeg parser needs to throw errors for invalid input regardless of what we choose. For transformations that need the input type, the input type or character set can be specified as a parameter on that transformation when the signature is created. For character set encodings, I don't think there is much we can do for the general case. Frequently it won't be an issue, the character set will stay constant between signing and verification. If they have an entire XML document, the character set can be determined from the prolog. Otherwise, there is no way of knowing. The http protocol may give us some information that could be passed through in the dual-input/output model, but any other protocol probably won't give us useful information. In the XML case, we could recommend dealing with this issue by canonicalizing the entire document into UTF-8, and then picking out the appropriate fragment. Doing it the other way around would lose the character set encoding information. -Mark Bartel JetForm -----Original Message----- From: John Boyer To: IETF/W3C XML-DSig WG Sent: 10/15/99 1:09 PM Subject: RE: Quick Comments on Types/Encoding of XML Hi Joseph, Don and others on the teleconference, I was not completely with it yesterday because my wife's been in hospital the day before (she had a miscarriage (which really hurts) with some complications). Anyway, I wanted to further clarify the 'input/output type' problem we were discussing on the telecon. It seems that there was some confusion because there were two issues at hand. First was the charset encoding and second was the MIME type. For charset encoding in the XML transformations, I solved the problem (interestingly in the way that Peter Norman suggested). Some on the call indicated that this was not a solution they preferred. OK, so we can tweak it a bit, but the charset issue is solved. By the way, the charset transformations require knowledge of how the document type deals with changes in charset (unlike base64 which couldn't care less). I classified them as c14n algorithms to be defined in section 7.5 because XML c14n seems to have the precedent on charset changes, and we should stick to the standards as much as possible. For other types of documents, such as MIME type, it seemed exceedingly pedantic to me to define an input type for the algorithm when the input type is evident from the algorithm itself. I do not believe that the same transform can process multiple types of documents unless the transform doesn't actually parse the document (like base64) in which case it does not need to know the input type. A JPEG decompressor just isn't going to work on an XFDL form. It is the responsibility of the entity that creates the transformation sequence to match up the transformations based on the expected input and output of the types of algorithms that *the entity* strings together. The signature system simply will not have need of this information. If the transformation code receives invalid data, it will generate errors (e.g. if you push non-xml data into an xpath transform, the xml processor will blow chunks). Basically, the transformations I defined did not need this information at all, and the application specific transformations are 1) quite unlikely to need the type information, and 2) quite capable of placing the type information in the Transformation element content even if they do need it. It's really a question of who does the error checking. When I say the type information is not needed, I'm not saying that data of the wrong type shouldn't generate errors, I'm just saying that the core signature syntax and processing rules should not be burdened with trying to figure out the input document type of a Transformation (which would be necessary in order to check whether they match the recorded input type). It has been said repeatedly that we need to be quite specific about the processing rules of these transformations. I was quite specific, but I think some mistook my not restating the rules of XML or other specifications for ambiguity. I would encourage a reread of the section before further discussion, and please address concerns about the actual proposed material to me. It doesn't make sense to have a telecon call where we spend 40 minutes discussing things that are solved either by the spec or implicitly by the specs to which the spec refers. Thanks, John Boyer Software Development Manager UWI.Com -- The Internet Forms Company Note, if the XML is anything other than UTF-8 or UTF-16, the encoding should be present: http://www.w3.org/TR/REC-xml#sec-guessing Because each XML entity not in UTF-8 or UTF-16 format must begin with an XML encoding declaration, in which the first characters must be '<?xml', any conforming processor can detect, after two to four octets of input, which of the following cases apply. _________________________________________________________ Joseph Reagle Jr. Policy Analyst mailto:reagle@w3.org XML-Signature Co-Chair http://w3.org/People/Reagle/
Received on Friday, 15 October 1999 17:28:35 UTC