Re: Quick Comments on Types/Encoding of XML

Hi,

My problem stemmed from concern that we were throwing away the fragment
context.  That is, if we decide that you couldn't use

    <Locations>scheme://authority/path#fragment<Location>

for Location, I'd like for there to be something I can put in Location
and in Transformations that would have exactly the same effect.  So if
instead I use

    <Location>scheme//authority/path</Locations>
    <Transformations>
         <Transform Algorithm="...">
             <Parameter>fragment</Parameter>
         </Transform>
    </Transformations>

Is there an Algorithm I can define so that this has the same effect as
the fragment on the end of a URI?  The answer is, if Content-Type
inforamtion is not forwarded, no, because a fragment on the end of a
URI is defined to invoke the fragment processor for the MIME type of
the data pointed to dynamically determined at run time.  In fact, I
think that treat-as-a-fragment is sufficiently useful that it would
make sense to make it the default Transform (and Parameter type).  On
the other hand, in the message below, it seems to be said that this
isn't worth much or if it is it should be accomodated by explicitly
providing the MIME type as in

   <Location>scheme//authority/path</Locations>
   <Transformations>
       <Transform InputType="text/html">
	   <Parameter>fragment</Parameter>
        </Transform>
   </Transformations>

I guess the concept of dynamically determined type being past along
isn't particularly valuable for signatures so I'm willing to go for
the InputType form above.  That way of handling things would also
solve cases like the following for charset

    <Location>ftp://example.com/miscellaneous/file</Location>
    <Transformations>
        <Transform Algorithm="urn:minimalCanon" Charset="shift-jis" />
    </Transformations>

So, since there wasn't much support for it and substantial opposition,
I'm willing to drop the idea of passing along type/charset/etc. info
and have those just be provided as Transform input.

Thanks,
Donald

PS: I put MIME types and charsets as attributes above because it felt
natural to do so.  I put Location as an Element even though it is a
URI and I think should probably be an attribute.

PPS:  A few more not very important comments of mine below...

From:  Mark Bartel <mbartel@thistle.ca>
Resent-Date:  Fri, 15 Oct 1999 17:28:40 -0400 (EDT)
Resent-Message-Id:  <199910152128.RAA17363@www19.w3.org>
Message-ID:  <91F20911A6C0D2118DF80040056D77A2032A56@arren.roke.thistle.ca>
To:  "'John Boyer '" <jboyer@uwi.com>,
            "'IETF/W3C XML-DSig WG '"
    	 <w3c-ietf-xmldsig@w3.org>
Date:  Fri, 15 Oct 1999 17:28:33 -0400

>I agree with John here.  I must admit that the whole argument about types &
>transformations confused me until I realized that the idea was to redefine
>what a transformation was.
>
>So, in the interest of clarifying the problem, I'll explain my thinking.
>
>As I understand it, the original idea was that the transformations were a
>pipeline taking in an octet stream at one end and producing an output octet
>stream.  Very simple and straightforward; one input and one output.
>
>The new idea is to augment the octet stream, by adding a parallel type.  In
>other words, each algorithm would have two inputs and two outputs, one of
>which is an octet stream and the other a "type".  The idea comes from the
>http world where one retrieves a document (analogous to our octet stream)
>and also gets a whole whack of other information along with it in the http
>headers, including the mime type of the document.  Browsers also sometimes
>utilize the file extension to determine type.
>
>The motivation for the new concept is that some transformations require
>knowledge of the input type.  The main example is the fragment id, which is
>defined per type.  Another example is character set encodings.
>
>I strongly feel that transformations should have one input and one output.
>This makes the specification and implementation for each transformation much
>simpler and therefore simplifies interoperability and testing.  The document
>being transformed is not going to switch from being text/html to image/jpeg
>betwixt the signing and the verification, and if it does the signature
>wouldn't verify anyway.  The html or jpeg parser needs to throw errors for
>invalid input regardless of what we choose.  For transformations that need
>the input type, the input type or character set can be specified as a
>parameter on that transformation when the signature is created.

Well, being pedantic, if you can specify Parameters, the Transforms
have more than one input.  But they have only one output and there is
only one thig that is passed to the next Transform in sequence, which
is a simplification.

>For character set encodings, I don't think there is much we can do for the
>general case.  Frequently it won't be an issue, the character set will stay
>constant between signing and verification.  If they have an entire XML
>document, the character set can be determined from the prolog.  Otherwise,
>there is no way of knowing.  The http protocol may give us some information
>that could be passed through in the dual-input/output model, but any other
>protocol probably won't give us useful information.
>
>In the XML case, we could recommend dealing with this issue by
>canonicalizing the entire document into UTF-8, and then picking out the
>appropriate fragment.  Doing it the other way around would lose the
>character set encoding information.

It is possible that you have a piece of XML and independent knowledge
that its in some particular charset.  I don't see how this is much
different from MIME type.  If needed, it can be a Transform parameter.

>-Mark Bartel
>JetForm

Received on Wednesday, 20 October 1999 22:47:03 UTC