- From: by way of Joseph Reagle <ross@contivo.com>
- Date: Wed, 16 Oct 2002 18:44:27 -0400
- To: XML Encryption <xml-encryption@w3.org>
Greatings, encryption folk! XML Schema working group has reviewed the XML Encryption Requirements document. I have been asked to send you the following comments. Please feel free to forward any questions of clarity or intent back to me. Thanks, - Roß Thompson on behalf of the XML Schema Working Group -------------------- When validating an XML document, the model that XML Schema uses is that the document be in the form of an infoset, not an XML data stream. The purpose of an XML Schema is to express constraints on the form that the infoset can take. There is a feeling among some of the Schema WG that trying to validate an infoset that contains encrypted data amounts to a mixture of levels -- the "natural" schema you would write for an XML document would describe the information derived from the unencrypted data. The act of encrypting data obscures that data from a XML processor which does not possess the decryption keys, and therefore changes the infoset that derives from the serialized document. Virtually all of the issues mentioned below arise because of this level mixing. The Schema working group has, from time to time, considered the viability of co-occurrence constraints, which might be used to alleviate some of the problems, but Schema has no immediate plans to include such constraints. We also discussed the possibility of using complex type unions to address some of the concerns, but we similarly have no immediate plans to introduce such types. Finally, substep 1 of step 3 of the encryption processing rules (listed in section 4.1) specifies the encryption of character strings. Would it be better to sign or encrypt pieces of the infoset? For example, if ignorable whitespace is introduced into the document's serialized form, do you want the encrypted form of the document to be sensitive to this? Schema does not presume to tell Encryption how to do their business, but we felt this was an issue worth raising. To amplify this point, consider the following two cases: 1) Infosets may exist for which no XML serialization is ever created. Consider a document created through a DOM, stored in an XML database that uses optimized internal representations of the Infoset. Presumably, the consuming application could be provided a DOM or SAX interface without ever creating an "<...>" form serialization. If that database is used as the backing store for a workflow application, it's extremely useful to be able to encrypt fragments of the document, but creating and storing an XML 1.0 or XML 1.1 serialization merely to encrypt it is artificial. Note that Schema has taken some trouble to base itself on Infoset, so that such non-serialized documents can indeed be validated. 2) Consider this instance: <xx:foo xmlns:xx="namespace-uri1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <xx:bar xsi:type="xx:myBoolean">0</xx:bar> </xx:foo> with a schema that includes <xs:simpleType name="myBoolean"> <xs:restriction base="xs:boolean"> <xs:pattern value="0|1"/> </xs:restriction> </xs:simpleType> <xs:simpleType name="myUnion" memberTypes="xs:integer myBoolean"/> <xs:element name="bar" type="myUnion"/> If the element bar is encrypted, rebinding the prefixes before decryption will cause validation to fail (among other things), first because the prefix on element 'bar' will be wrong, and second because the prefix in the value of 'xsi:type' will be wrong, in a way that has the potential to affect validation or even the interpreted value of the element. This says that there is a strong requirement that no application ever change namespace prefixes on a document with encrypted elements. We find this to be worthy of a very salient warning to users and implementors, at the very least. We would like to encourage a mechanism that was not as fragile in this area, and which did not introduce an non-compositionality of processing. We recognize that the namespace abbreviation is part of the infoset, and that resolving this issue will require some hard thinking. It's a nasty problem, and we don't have a ready solution, but we think that is what makes it worth considering. These two points together argue strongly that it is the contents of the infoset that should be signed, and not the serialization of the infoset. ------------------------------------------ Some observations that came to mind when reading through the proposed specification are: - If the XML processor knows the decryption keys, then the infoset for the document is just as if the plain text XML were in place. In this case, there is no impact as regards Schema, because the fact of encryption has been hidden from the schema processor. In short, this is not an issue. - If the XML processor does not know the decryption keys, then the XML infoset will contain the elements that represented the data in its encrypted form. In this case, there are severe limitations on schema validation, because as far as validation is concerned, the encryption elements have no special status. In particular: - There will be no way for the schema validation to verify that the encrypted XML conforms with the schema. - Unless the schema is written with encryption in mind, the processor will not be able to strictly assess even the unencrypted portions of the document against the schema. If lax validation is allowed, then certain cases will validate correctly, but most won't. Obviously, skip validation will pass, but this provides no information about document correctness. - Writing a schema that allows encryption will be difficult, unless encryption is only allowed at a few certain points in the document. Consider the following schema: <xs:schema> <xs:element name="the_corn"> <xs:complexType> <xs:sequence> <xs:element name="kernel" type="xs:string"/> <xs:element name="husk" type="xs:string"/> <xs:element name="cob" type="xs:string"/> </xs:sequence> </xs:complexType> </xs:element> </xs:schema> To add the ability to encrypt the children of the_corn, you would have to write: <xs:schema> <xs:element name="the_corn"> <xs:complexType> <xs:choice> <xs:sequence> <xs:choice> <xs:element name="kernel" type="xs:string"/> <xs:element ref="enc:EncryptedData"/> </xs:choice> <xs:choice> <xs:element name="husk" type="xs:string"/> <xs:element ref="enc:EncryptedData"/> </xs:choice> <xs:choice> <xs:element name="cob" type="xs:string"/> <xs:element ref="enc:EncryptedData"/> </xs:choice> </xs:sequence> <xs:element ref="enc:EncryptedData"/> </xs:choice> </xs:complexType> </xs:element> </xs:schema> And even this doesn't capture it, because you really want to be able to encrypt "kernel" and "husk" in a single EncryptedData block, and have "cob" be plain text. In fact, in order to capture that additional complexity requires that you violate the UPA constraint, so there is no legal schema that has this flexibility. (Actually, the UPA constraint makes even the above schema illegal if the minOccurs != maxOccurs on any of the children of the_corn.) A possible approach to resolving this problem, which Schema would encourage you to consider, is to specify not a specific element, but a complex type of encrypted data. This would allow the schema author to specify element X and an encrypted form of X as alternatives. So, the original schema might be rewritten thus: <xs:schema> <xs:element name="the_corn"> <xs:complexType> <xs:sequence> <xs:choice> <xs:element name="kernel" type="xs:string"/> <xs:element name="kernel-enc" type="enc:EncryptedData"/> </choice> <xs:choice> <xs:element name="husk" type="xs:string"/> <xs:element name="husk-enc" type="enc:EncryptedData"/> </choice> <xs:choice> <xs:element name="cob" type="xs:string"/> <xs:element name="cob-enc" type="enc:EncryptedData"/> </choice> </xs:sequence> </xs:complexType> </xs:element> </xs:schema> This is a fine schema, in terms of the Unique Particle Attribution constraint, and allows for an arbitrary decisions on which of the children of the_corn are encrypted. Unfortunately, this approach still does not allow for encoding multiple children in the same encrypted data segment. Adding such complexity to the schema would make it unwieldy. (It was observed in discussing this proposal that developers of encryption processors may prefer an element, which they could be guaranteed of recognizing by its QName, to a type, which would require them to use a schema processor upstream. One solution to this dilemma might be to specify a required attribute with a fixed value as part of the complex type (so that elements of types husk-enc, say, was required to have the attribute value specification enc:EncryptedData="..."). The value of the attribute, could be a boolean, or a version number, or information about the key, or a public key, or whathaveyou.) (Another observation made during the discussion: An agent in posession of a schema for the plaintext document will be able to infer information about what tags are encoded. If a schema calls for elements A, B, and C, in order, and the instance document contains A, B, and an EncryptedData tag, it is fairly obvious what tag has been encrypted. This could, perhaps, facilitate some decryption attacks, because it gives the attacker knowledge of some of the plaintext. In particular, it is very likely that the text begins with "<C" and ends with "</C>". We recommend a note for implementors and users of XML Encryption that warns them of this.) If the places where the encryption can appear in the instance document is fairly small, then doctoring the schema as above is practical, though perhaps painful. If it is not small, then it is really impractical, which in turn means that validation of documents containing encrypted content is not practical for a processor that does not have access to the decryption keys.
Received on Wednesday, 16 October 2002 18:45:01 UTC