RE: XMLP/XMLE Use cases and processing models from David Orchard on 2002-02-13 (www-xenc-xmlp-tf@w3.org from February 2002)

From: David Orchard <david.orchard@bea.com>
Date: Wed, 13 Feb 2002 15:33:46 -0800
To: "'Takeshi Imamura'" <IMAMU@jp.ibm.com>, <reagle@w3.org>
Cc: "'xenc'" <xml-encryption@w3.org>, <www-xenc-xmlp-tf@w3.org>, <xml-dist-app@w3.org>, "'Hiroshi Maruyama'" <MARUYAMA@jp.ibm.com>
Message-ID: <004901c1b4e6$e29afb50$190ba8c0@beasys.com>
Takeshi, Joseph, et al,

In this note I will focus on one particular issue, context information about
an XML Document and its contents.  I'd like to point out that the TAG is
currently examining the issue of dispatch based upon content-type, namespace
names, and root element names [1].  Further, the TAG has an outline of an
architecture document that defines a generic dispatch model based upon
content-type and namespaces.  There is a proposed algorithm contained
therein[2].  The TAG has not reached consensus yet on these issues though.
There are a number of aspects of the TAG discussions that may be relevent,
particularly the registering of content-types and dispatch mechanisms.  I
had hoped that the TAG would be able to publish a finding on dispatch based
upon our F2F meeting on Tuesday Feb 12th, but we need some more tweaking on
the wording.  I do feel comfortable saying that registeration of a content
type for an xml vocabulary that may be embedded within another and is
intended to be dispatched on - like xenc and xslt - is likely to be an
outcome.

My principle concern on the namespace name issue is to ensure that an XML
document with it's content information is accurate and self-describing.

The net is I have 2 proposals for XMLE and would like to entertain
discussion on 2 others:
1. XML Encryption registers a content-type, perhaps
application/xmlencrypted.
2. XML Encryption say wording to the effect that a document containing a
vocabulary that also contains encrypted content where the decryption is
required to make the instance conform with the vocabulary namespace name,
MUST provide some metadata to indicate decryption is required.  Changed
namespace names, content-type (this is my step #7 in algorithm following) ,
SOAP mustUnderstand intermediaries (step #6 is algorithm following) are
valid examples of this.
3. The XMLE group entertain the definition of algorithms for dispatch based
upon content-type, namespace names, and/or other metadata.  It may be that
this is an XMLE/XMLP liason issue or a web services architecture wg issue
though.
4. The XMLE/XMLP liason group discuss the definition of a metadata solution
to problem described in proposal #2 or ask the wswag to suggest/allocate
resources or wgs to define such a solution.  Given the lack of traffic on
xenc-xmlp, we may wish to raise to wsawg.  But maybe this note will spawn
some traffic ;-)

I find your option #4 to be interesting and worthwhile pursuing and very
appropriate for SOAP, though I suggest a different approach that allows
decryption to occur without the use of a decrypting actor.  The solution of
creating metadata is exactly the right path to go down.  We could try to
merge the content-type for messages concept with the content-type for header
blocks.  The use of a content-type for the whole message solves this problem
for messages outside the SOAP context, so I think we could apply this to
headers.  What I think we might need is a "content-type" for the block
content.  For now, my thought is that a new SOAP attribute of content-type
would be sufficient.   This would contain a content-type values as defined
by MIME content-type.  BTW, this may have nice side effects as charset
information could be encoded in the content-type.  This also should work
well with existing infrastructure to do content-type dispatch.

Taking a look at the header more closely, my proposal is that the header is
labeled as a content-type application/xmlencryption, ie <header><po:po
xmlns:po="..." soap:content-type="application/xmlencryption"
><xenc:encrypteddata/></po:po></header>.  In this fragment, the po namespace
name knows nothing about the encrypted contents.  Decryption of the xenc
block is required to reconstruct the po.  This is very similar to the case
where a message has content type application/xmlencryption and containing a
PO that also contains encrypted data.

Arguably the content-type concept is not a SOAP concept, more of an XML
concept brought on by mixing namespaces.  But I believe it belongs in SOAP
because SOAP has an explicit message path model that may require
decryption/encryption at various points in the message path AND we wish to
keep SOAP as content-type application/soap.

There is a 5th option that I propose to add to your list of options.
5. The namespace name is left as is, and a Content-type of
application/xmlencryption is used for messages and/or blocks.  A receiver
then dispatches to an decrypting service for decryption.  Presumably this
would know which encrypted elements to decrypt, and as well how to dispatch
to the next node.

I see a rough algorithm of dispatch relating to XMLE:

1. If the entire message is encrypted and sent then the root element will be
changed to the encryption element.  The first receiving node can do dispatch
based upon the root element or the encryption namespace name.

2. If a portion of a message is encrypted, and the message cannot be
interpreted without decryption first, then a content-type of
application/xmlencryption must be used.  The namespace name can be left as
the unencrypted namespace name.  The receiving node dispatches based upon
the content-type.  Your second example fits here.  This technique has been
used for compression as well, with content type application/gzip used.

3. If the encryption is baked into the application context, then the
applications namespace name accurately reflects the fact that the
application "understands" encryption.  Dispatch based upon namespace name or
root element is possible.

4. In the SOAP context, decryption of the content of a header may be reqired
before the node can process it.  In this case, complete encryption of the
header contents is done.  The header handler dispatches to the decryptor
based upon the header blocks "root element", which happens to be an xmle
element.  This is similar to my algorithm element #1.  This does not use an
explicit actor.  Hiroshi's example of SOAP-SEC body does NOT follow this
model [3] as it uses an explicit actor.
<header><xenc:encrypteddata/></header> is a sample.

5. In the SOAP context, the encryption may be baked into the header context.
Dispatch is based upon the namespace name or root element.  This is similar
to my algorithm element #3.  <header><encryptionawareheaderns:foo/></header
is a sample

6. SOAP dispatch is done as per the actor attribute.  The intermediary
specified by the actor will know what to do with any encryption/decryption
required, including other portions.  Hiroshi's example of SOAP-SEC header
follows this model [3].  <header><SOAP-SEC:Encryption actor="some-URI"
mustUnderstand="1"/></header>

7. In the SOAP context, there is support for a descendent of a header block
to be encrypted and decryption is required at the header node before
handling is possible IFF meta-data is supplied.  A header without some
meta-data attribute(s) to indicate decryption processing for cases of a
descendent needing decryption is NOT allowed.  Thus <header><po:po
xmlns:po="..."><xenc:encrypteddata/></po:po></header> is NOT allowed where
the encrypteddata is not understood by the po vocabulary.  I've suggested
that a content-type field would be useful here, so <header><po:po
xmlns:po="..."
soap-env:content-type="application/xmlencrypted"><xenc:encrypteddata/></po:p
o></header> would be sufficient to do dispatch.  This scenario allows
intermediaries to be built that do not require an explicit actor to do
decryption, ala step #6 and the soap-sec example.  Targetting decryption
actors worries me as we start adding many different types of headers, some
of which may be encrypted and some not.  This eerily feels like targetting a
schema validator.

Step #7 is the key example that I have been concerned with wrt namespace
names.  It is almost exactly like the xslt example where xhtml is the
containing element, yet xhtml processor cannot understand the xslt content.
Dispatch to an XSLT processor is required, but this can't be done by the
namespace name nor the root element.  Hence the discussion about
content-types.   In example #7 dispatch to the decrypting software is
required, but the software can't determine this from the po header, it has
to *magically* know based upon the presence of an xenc namespace name
inside!

Another algorithm would be to define something in the XMLE namespace along
the lines of "secured", but that would percolate xmle into the header block.

Cheers,
Dave
[1] http://lists.w3.org/Archives/Public/www-tag/2002Jan/0177.html
[2] http://www.w3.org/2001/tag/doc/toc
[3] http://lists.w3.org/Archives/Public/www-xenc-xmlp-tf/2001Dec/0001.html


> -----Original Message-----
> From: Takeshi Imamura [mailto:IMAMU@jp.ibm.com]
> Sent: Tuesday, February 12, 2002 8:20 AM
> To: David Orchard; reagle@w3.org
> Cc: 'xenc'; www-xenc-xmlp-tf@w3.org; xml-dist-app@w3.org; Hiroshi
> Maruyama
> Subject: Re: XMLP/XMLE Use cases and processing models
>
>
>
>
> >> Questions
> >> ----------
> >> 1. I have anticipated that the intermediary has to know about the
> >> receiver schema and must expose the "merged" schema or a
> schema without
> >> the unencrypted credit card info to senders.  There is
> tight coupling
> >> there, but how else would the intermediary know which
> messages and which
> >> portions to decrypt?  Is this valid or does XMLE expect that a SOAP
> >> encrypting/decrypting intermediary would not know about
> the receiver
> >> schemas nor would it expose a "merged" schema?  This
> assumes the first
> or
> >> second solution in the processing requirements (3.2.2) [1].
> >
> >I'd be interested in some of the xenc implementors thoughts
> on this since
> >they best know how they want to process the data. However, given my
> >ignorance I could see a few options:
> >
> >1. The namespace is changed to indicate the change of the
> instance. This
> >means you will have to have a namespace for every encrypted
> variant. This
> >is probably fine for some applications (e.g., they know they
> only care
> >about encrypting the credit-card data and nothing else so
> creating a new
> >namespace and schema isn't very difficult.) Others may care
> less about
> >validation and more about flexibility.
> >2. One "pre-scan's" the document looking for xenc:*
> elements. For example,
> >during parsing (e.g., XNI [1]) one could flag such instances
> that trigger
> >subsequent action.
> >3. The encryption is "baked" in to the application context.
> For instance,
> >if you know that you'll be sending credit card data over an
> open network
> >there needn't be choice between the credit card data in the
> plain and the
> >encrypted form. The credit card is *always* sent encrypted and the
> >recipient is always decrypted at the other end.
> >4. Meta-data is used to indicate the some of the data has
> been encrypted.
> >For instance, to make option 3 a little more flexible, one
> could create a
> >SOAP confidentiality header that indicates a decryptor actor with
> >mustUnderstand="1".
> >
> >In these options there are two issues: (1) how to know if
> parts of the
> >document have been encrypted, (2) how to know which agent is
> supposed to
> do
> >the decryption.
>
> These issues are not only ones of XML Encryption but can be
> ones of others.
> So we should address them as generally and extensiblly as
> possible.  From
> such a point of view, I think option 4 (e.g., [1]) looks the most
> reasonable.
>
> [1]
> http://lists.w3.org/Archives/Public/www-xenc-xmlp-tf/2001Dec/0001.html
>
> Thanks,
> Takeshi IMAMURA
> Tokyo Research Laboratory
> IBM Research
> imamu@jp.ibm.com
>
>
>
Received on Wednesday, 13 February 2002 18:38:25 UTC