Re: First draft of Section 7.6 and proposed sections 4.3.1 and 4.3.3 from Lightning on 1999-10-14 (w3c-ietf-xmldsig@w3.org from October to December 1999)

From: Lightning <lightning@pacificcoast.net>
Date: Thu, 14 Oct 1999 06:40:37 -0700
To: "Donald E. Eastlake 3rd" <dee3@torque.pothole.com>
Cc: <jboyer@csr.csc.UVic.CA>, "DSig Group" <w3c-ietf-xmldsig@w3.org>
Message-ID: <003a01bf1649$c46d8a00$30628e8b@ace>
Hi Don,

>>=====================================
>>Regarding the addition of fragments to Location:
>>
>>I was also quite troubled by not being able to do a simple ID reference in
>>the Location.  Either way this ends up going, it shouldn't impact a lot of
>>the material in 7.6.  However, I did do a fair amount of background
reading
>>through many of the specs (which is why it took me so long to write, even
>>though for the sake of terseness it doesn't necessarily come across).
>
>[Just as background, the agrument had been that you need the DTD to
>tell what is an id and you might not have the DTD so, since a fragment
>can be just an id, lets prohibit them.  In retrospect, I don't think
>this is a very good argument because a signature verification program
>always has the DTD for XML digital signatures built in and can find
>any ids in elements we specify.  In addition some applications may
>similarly just know the DTD for their stuff.  And even if not, a DTD
>might be available.]

I am certain this was not the background argument for whether or not to put
ID in Location.  The argument for not putting ID in Location was simply that
it was insufficient for solving all problems since transformational steps
may need to precede the application of the ID filter.

The argument about not having the DTD was with regards to canonicalization.
If you canonicalize, and the algorithm strips out the DTD, then a subsequent
ID filter transformation will be unable to identify the ID attribute based
solely on the output of the canonicalizer.

Finally, the fact that some applications may know their DTD, and that other
DTD's might be available, means that some signatures might verify in some
limited application domains.  A digital signature will be much more useful
for verification programs that may live outside of the immediate application
domain if the DTD information (internal or external) is maintained as part
of the resource that actually got digested (in other words, even if the DTD
lives outside of the document, the information required by an XML parser to
find that DTD should be in the *secured* information, where secured means
'covered by the digest').

>
>>According to RFC2396, the fragment part after URI# is not assumed to be an
>>ID reference into XML.  "It is a property of the data resulting from a
>>retrieval action, regardless of the type of URI used in the reference.
>>Therefore, the format and interpretation of fragment identifiers is
>>dependent on the media type [RFC2046] of the retrieval result".  So, if we
>>follow the consistency angle to the end, it seems that we have to be
>>prepared for someone putting a full-blown XPointer after the URI#, or for
>>that matter any document-specific, arbitrarily complicated reference
>>expression after the URI#.
>
>So, if it's a complex XPointer or something else you don't support,
>can't you just give an error?

At this point of my document, I was only pointing out that allowing #frag
after a URI meant doing more than just allowing #ID.  Since you are now
putting an error system in place for it, I assume we agree on this point.
One reason I slightly prefer to have the #frag in a Transformation step is
that the Transformation element as an algorithm identifier, so it is easier
to pick a parser, and the failure of that parser immediately means that we
can generate an error.  With the #frag in Location, we really don't know
which parser should be used.  The Type element could help narrow it down,
but still one may have to run more than one parser without generating errors
before finding the right one for the job.  And if all of them fail, which
one do you use to report the syntax error?  It's that kind of guesswork that
somehow always leads to trouble (maybe it wouldn't in this case, but who
knows).

>
>>Actually, this isn't overly troublesome since applications will still have
>>the Type information (Section 4.3.2) to help decide which parser to run on
>>the material after the # (or if the application can process the fragment
at
>>all).  The parts that are a little more troublesome are as follows:
>
>For just an id fragment as a Location in ObjectReference in
>SignedInfo, you know that it's XML.  For other cases, you may have to
>figure out the type.  In particular, for a URI Location, I think you
>generally need to come up with a type just like a browser does.  I
>would expcect some XMLDSIG applications to have a call out for most
>URI schemes (they could probably handle data:) and expect a byte array
>and a type back.  Is there any difference between
>
> <Location>ftp://host.example.org/bar#IDxyz</Location>
>
>and
>
> <Location>ftp://host.example.org/bar</Location>
> <Transformations>
> <Transform type="...">#IDxyz</Transform>
> </Transformations>
>
>?  I don't think so.

There are processing differences (like the one described above), but I used
an example like this in a recent email to Dave Burdett trying to show that
representationally, they were quite equivalent.  However, since they can
represent the same information, wouldn't it be preferable in the interest of
processing simplicity to have just one way of doing it? Further, since we
need the #frag step in the transformations (so it can be after other
transformations), if we are to choose one way, then that way should be the
latter.

>
>And in support of my claim that you need to pass along type information,
>what about
>
> <Location>ftp://host.example.org/base64bar</Location>
> <Transformations>
> <Transform type="...XPath">...</Transform>
> <Transform type="...">#IDxyz</Transform>
> </Transformations>
>

I get what you mean, and you are right that the ID transform is unlikely to
be viable after the XPath, but it's not like you can't do the ID transform
as part of the xpath transform.  I think we agree that if the input to a
transform does not provide sufficient information, then it should be an
error.  The difference is that I would prefer not to create an artificial
context that is bound to be inadequate no matter how hard we try.  I would
prefer to push the work off onto the creator of the transformation sequence
who at least supposed to have knowledge of the specific task trying to be
achieved.  If the creator shoots himself in the foot, it should be
sufficient to provide an error message telling him that his foot is bleeding
and recommending that he phone the hospital if he can't get the bullet out
(at around 4:30am, this sounds pretty humorous; I apologize if it turns out
not to be funny later on).  The end result is that the creator of the
transformation sequence is responsible for making sure that the actual
output of one transform is sufficient input for the next transform.

This is a parallel to the idea that the output of the final transform has to
be sufficient input to the digest to truly secure what the transformation
sequence creator intended to secure.  The former idea is far less demanding
than the latter, for which there will be no error message prior to arrival
in the courts.

>(my syantax above is, no doubt, totally screwed up but hopefully you
>get what I mean).
>
>Furthermore, what about
>
>        <Location>ftp://host.example.org/base64bar</Location>
>        <Transformations>
>                <Transform type="...base64decode"/>
>                <Transform type="...">#IDxyz</Transform>
>        </Transformations>
>
>Doesn't this example show that <Transform> needs to have an optional
>OutputType attribute so that following Transforms will know what to do
>and what is legal?  (If you know something is an MPEG, trying to apply
>an XSLT to it should be an error.)

Right, and the XML parser that tries to read an MPEG will have no problem
generating an error.  The type of output for the first transform is implicit
in the input requirements of the next transform, which is given explicitly
by the Algorithm attribute (given as type above) of the latter transform.
The creator of the transformation sequence is responsible for creating a
sequence that works.  To be honest, I expect that the transformation
sequence creator will often be a program aiding a user.  While the user is
naive, the company that writes the program is not, and it is quite
reasonable to assume that they've designed the user interface and the
transformation creation code to work together.  If there are design
loopholes in such a program, any reasonable QA process will produce the
errors before a user ever gets the program (or the company won't be one for
very long).

>And in fact, doesn't Location need
>that optional attribute also so if the Location is a URI pointing to
>some file and you don't want to depend on or can't guess the type from
>a file extension, for example, you can specify the type?
>

Actually, Location already does.  It is the Type element in section 4.3.2.

>>1) It is not possible to distinguish between XPath and XPointer in
Location
>>whereas it is possible under the current formulation of section 7.6.
Aside
>>from inconsistency, this actually could be useful for those who are in
those
>>constrained situations and feel that XPath support is sufficient whereas
>>XPointer is too burdensome.
>
>This may be an artifact of XPath never being designed to be used
>outside of the context of XSLT or XPointer.  See other comments of
>mine below.

Up to now I've been talking a lot about XPath in an effort to shave off all
unnecessary complexity because of the spin I keep getting that any extra
work is too much.  It could be cleaner to just stick to Xpointer (which
gives all of the benefits of Xpath that I've been talking about since Xpath
is a proper subset).

>
>>2) Applications would need two quite different algorithms for determining
>>whether they could support partial document signatures.
>
>Sorry, you've lost me here...

If the #frag appears in the Location, then you have to obtain the Type,
decide which fragment parsers are applicable.  If the result of this query
is the empty set, then you don't support the fragment.  Then you run each
until one generates no errors, and then you hope that it is the algorithm
intended by the document creator.

If the #frag is part of a Transformation, then you obtain the Algorithm
attribute value and decide whether you support the indicated parser.

>>===============================================
>>Regarding default canonicalization versus no Transformations
>>
>>>I believe consensus was no transformation.  If some default
>>>canonicalization was applied, it would have to be data type dependent
>>>since even minimal canonicalization doens't make much sense for, say,
>>>a JPEG file.
>>
>>I agree wholeheartedly and will be quite happy (relieved in fact) to
change
>>this.  The comment was actually copied from the existing spec.  Given the
>>other changes I recommended, perhaps it would've been wiser to recommend
>>changing that too.  However, I assumed the default would end up being null
>>c14n anyway.  It seems best to leave the data alone unless an 'explicit'
>>statement of Transformation is made.
>
>I'm OK with leaving it to default to no transformation but right now
>my personal opinion is that it would be better to make this type
>dependent.  If something is of type text/*, including text/plain or
>text/xml, minimal canonicalization would really be better.
>

That's really subjective.

>>==================================
>>Regarding Handling of Encoding Information
>>
>>>><p>The <code>Transformations</code> element contains an ordered list
>>>>of <code>Transformation</code> elements.  The output of each
>>>><code>Transformation</code> serves as input to the next
>>>><code>Transformation</code>.  The input to the first
>>>><code>Transformation</code> is the raw data result of obtaining the
>>>>resource given by <code>Location</code>.
>>>>The output from the last <code>Transformation</code> is the input for
the
>>>>digest algorithm.</p>
>>>
>>>I believe that encoding information should be input to the first
>>>transformation and passed along, possibly changed by some
>>>transformations.
>>
>>I don't understand. The two places where encoding comes into play are 1)
>>encoding the actual Transformation element's content, and 2) Encoding of
the
>>object indicated by Location, which will be decoded by some
Transformation.
>
>I wasn't talking about the Transformation element content at all.

Yes but I was.

>
>I believe "encoding" also effects the transformed data being passed along.
>
>Actually, I think we have been misusing encoding.  The MIME community
>has a lot of smart people who have thought about these things for a
>long time and they distinguish "content transfer encoding" and
>"charset".  Mostly what we are talking about is charset.  We need
>transforms to undo Base64 and possibly Hex and Quoted-Printable
>content transfer encodings.  But changes between various UTF-x and ISO
>and other character sets is really different.  And it is this
>character set information that needs to be passed along.  At any point
>in the Transform pipeline, you might want to change the charset
>although most likely just to normalize it towards UTF-8 or UTF-16.
>This could also be right after a decode Base64 operation or the like
>where the charset might not be immediately obvious.  Therefor, I think
>that both Location and Transformation need optional OutputCharset
>attributes.
>

Actually, I had a similar concern about the need for encoding transforms
that represent a shift of character set, but then, I started thinking about
the fact that XML documents carry their own character set encoding
information.  At that point, I felt that character set transformations would
be handled by some version of Section 7.5 on canonicalization, which I was
not writing.

>>In the former case it is obvious that encoding information should not be
>>passed along since it applies only to the immediate transform, which must
be
>>decoded so we can find out what the transform is supposed to do (e.g. a
Java
>>class for decompression).  In the latter case, it seems neither necessary
>>nor always feasible to pass encoding information along.  Suppose an
>>application puts some base64 encoded data into an element as follows:
>>
>><MyData id="Data1">
>>    asdfasdfasdfasdfasdfasdfadsf
>></MyData>
>>
>>If they wanted to mark Data1 as base64 encoded, they would have to use
*our*
>>base64 encoding marker (currently urn:dsig:base64) rather than there own.
>>This is why I thought it would be best to denote the encoding in one of
>>*our* elements (namely, a Transformation that brings about base64
decoding).
>>Furthermore, this means that the decoding can be preceded by other
>>transformations.  For example, to meet requirement 3.1.7, the necessary
>>transformation sequence for recovering the original data out of Data1 is
>>
>><Transformations>
>>    <Transformation
>>Algorithm="urn:dsig:xpointer">id("Data1")/descendant::text()</Transformati
on
>>>
>>    <Transformation Algorithm="urn:dsig:base64"/>
>></Transformations>
>>
>>or, if one allows fragments in Location
>>
>><Location>#Data1</Location>
>><Type>text/xml</Type>
>><Transformations>
>>    <Transformation
>>Algorithm="urn:dsig:xpath">descendant::text()</Transformation>
>>    <Transformation Algorithm="urn:dsig:base64"/>
>></Transformations>
>>
>>Either way, it seems that the easiest way for an application to indicate
>>that the content was base64 encoded is to put a base64 decoding
>>transformation at the appropriate place in the list rather than having an
>>attribute on MyData that must be passed through the descendant::text()
>>transform (despite the intended semantic of throwing out the start and end
>>tags and the attributes).
>
>You're right on encodings but I think I was actually talking about
>charset's.

Yes, that's good, but those aren't represented by the Encoding attribute of
the Transformation element.  They would be represented by a Transformation
element and specified by its Algorithm attribute value.  Furthermore, I
believe they belong in Section 7.5 as a kind of canonicalization.

>>=====================================
>>Regarding  the Canonicalization Transformation
>>
>>>
>>>I think there will be few enough standard canonicalization algorithms
>>>that they can have different algorithm values.
>>>
>>>Null, Minimal, DOMcanon, W3Ccannon.  DOMcanon might take a parameter
>>>or have versions to determine (1) if it discards comments and (2) if
>>>it discards Processing Instructions.
>>>
>>
>>I agree that there are few c14n algorithms, but we already enumerate them
in
>>the c14nalg element (or whatever it will end up being called).  It would
be
>>fine with me if we wanted to enumerate them again as Algorithm values for
>>Transformation elements, but it seemed more appropriate at the time of the
>>first draft to reuse the existing markup definitions in Sections 4.1 and
7.5
>>so that changes to those sections would not imply changes to sections
4.3.3
>>and 7.6.
>
>I think c14nAlg is going away as soon as we have a good definition of
>the appropriate flavor of DOM canon to use there.

True that it will no longer be needed for signedinfo.  This leaves
canonicalization of the object, so the material in Section 7.5 will still be
needed.  The question is, does this mean that Section 7.5 will be rolled
into the section on Transformations (currently 7.6)?

>
>>====================================
>>Regarding the XPath Transformation Algorithm
>>
>>
>>>
>>>The above it not enough to specify how the output is formed.  Are
>>>there any new lines?
>>>
>>
>>I believe the statement I gave is precisely what is required, though I
could
>>explain more about my readings of the XPath spec, which could be helpful
to
>>readers of the dsig spec.  The linefeeds are in the XPath node-set if the
>>XPath specifies them as being in the node set.  They are represented by
text
>>nodes just like all the other text in the document.  Actually, they do not
>>appear as separate text nodes if there is other text in the element.  So,
if
>>your character sequence is
>>
>><Parent>\n\t<Child>multiline\ncontent</Child>\n</Parent>
>>
>>Then you would have three text nodes, one for the "\n\t", one for
>>"multiline\ncontent" and one for "\n".
>
>This has apparently changed.  At least the lastest XPath, in section
>5.7, explicitly says that a text node never has a text node sibling.
>As much text as is contiguous is always stuffed into a single text node.
>

I'm quite aware of this and assumed that my sentence would be interpreted in
the context of section 5.7.  My sentence does not say that the three text
nodes are contiguous.  There are three text nodes *because* they are *not*
contiguous.  Have another look at the snippet of XML.  There is a Parent
node whose children are 1) a text node containing "\n\t", 2) an element node
called Child, and 3) a text node containing "\n".  The Child element has the
following children:  a text node containing "multiline\ncontent".  As you
can see, there are three text nodes whose document order are:  "\n\t",
"multiline\ncontent" and "\n".

>It is still the case that the latest XPath Proposed Recommendation
>clearly states that what is returned by an XPath is an unordered set
>of nodes.  Now, nodes sets in XPath are rich enough that it knows
>their document order.  A function that reconsititues this node set in
>document order as XML is not hard to imagine but is not specified
>anywhere.

It is true that XPath has this statement, which they make for the purpose of
generality with XSLT.  As for where it is specified, your point on the
previous telecon was well-taken.  I had thought that this was part of what
you wanted out of section 7.6.  To wit, I found nothing to change this in
the XPointer spec.  Unless I missed the sentence where it gets changed to a
document ordered set, we will have to define our interpretation as being in
document order even if we only support XPointer and not XPath.  Perhaps we
should consider utilizing our liaison with the XPath and XPointer groups to
point out that for digital signatures, order matters.

>
>Note that some namespace canonicalization is implied anyway.  An XPath
>of "//" is every node but note that the location of original xmlns:
>attributes can not be fully reconstituted so there is no way to
>guarantee output of the same document.

I don't understand this.  XPath doesn't lose any information about
namespaces.  It's just trying to describe the document (including its
namespace attributes) as an XML application would see it after being parsed
by an XML processor (except that nobody knows how DTDs will be represented;
a single DTD node with no inner hierarchy would suffice for now since it
would represent whether or not to sign the whole DTD (subsetting the DTD
seems like a weird thing to do)).

>Similarly, if you use XPath to
>pick some piece of a document and then serialize it back into output
>XML, you rarely can tell where original namespace attributes were in
>it, if there were any, so you can't output the same XML for the piece
>as you read in.

It is the responsibility of the transformation sequence creator to retain
sufficient information.  We are never going to plug up all the ways that
people can shoot themselves in their feet.  This was the essence of that C
language example I gave some time back.  C is a standard, yet it is trivial
to write software that loops endlessly or does a stray pointer reference.

>
>This all may not be a problem but we would certainly need to write
>something very explicit about what it means to use XPath outside of
>XSLT or XPointer since it is currently only defined for use in those
>contexts.
>

I don't think this is a problem with using XPath outside of XSLT or XPointer
(and particularly the latter).  Whether we use XPath or XPointer, we must
state that the node set is in document order.  At that point, XPath and
XPointer are on the same footing as XSLT and stylesheet application.  The
real issue here is loss of information, which is something that can happen
with any partial document transform that you use.  I don't think we need to
be more specific than saying that the output of one transform goes into the
next transform (or into the digest algorithm for the last transform).  If
someone writes a transform that loses too much information, then that is
their problem.

Actually, if you get where I'm coming from, then the digest algorithm is
just another transformation, except that it has the following special
properties:

1) the syntax guarantees that there will be one.
2) one semantic guarantees that it will be run last
3) another semantic guarantees that the result will be stored in the
document as base64 encoded content of a digest value.

We could easily generalize what we are doing such that people are not
required to use a digest at all.  If we change the semantic to say "Whatever
comes out of the last Transformation will be put in a TransformationValue
element", then we could support something like this:

<Transformations>
    <Transformation Algorithm="urn:dsig:xpath">...</Transformation>
    <Transformation Algorithm="urn:nist-gov:sha1"/>
    <Transformation Algorithm="urn:dsig:base64encode"/>
</Transformations>

I don't think people want to go this far, but it illustrates the point that
the final result isn't even XML anymore and has lost all namespace
information, attributes, pointy brackets and so forth.  If someone puts
another Transform after the last one, then they will probably need crutches
and some new shoes.

Thanks,
John Boyer
Software Development Manager
UWI.Com -- The Internet Commerce Company
Received on Thursday, 14 October 1999 09:39:36 UTC