- From: Martin Duerst <duerst@w3.org>
- Date: Wed, 09 Jan 2002 14:06:49 +0900
- To: reagle@w3.org, dee3@torque.pothole.com, "Takeshi Imamura" <IMAMU@jp.ibm.com>
- Cc: w3c-i18n-ig@w3.org, xml-encryption@w3.org
Hello Joseph, Thanks for your quick and mostly positive reaction. At 18:18 02/01/08 -0500, Joseph Reagle wrote: >[resulting document > http://www.w3.org/Encryption/2001/Drafts/xmlenc-core/Overview.html > new revision: 1.100; >] > >On Tuesday 08 January 2002 03:45, Martin Duerst wrote: > > It is already addressed by the spec. But there is no corresponding > > requirement. While the requirement that one gets the same stuff > > back that one encrypted seems pretty straightforward, at least > > for character encoding issues, it might be easy to get wrong. > >The question here is then what exactly is it that you are getting back? In >the generic scenario above, you are getting back the same document (some >context might've been lost) but definitely not the same octets. Yes. > > You can get rid of mentioning the infoset by saying that the > > following two sequences of operations must lead to the same > > document: > > > > a) 1) Encrypting part of the document > > 2) Transcoding it from one character encoding to another > > 3) Decrypting what was encrypted in step 1) > > b) 1) Encrypting part of the document > > 2) Decrypting what was encrypted in step 1) > > 3) Transcoding it from one character encoding to another > >In this specific scenario, you are getting back the same octects. > >So, how about if I add the following text to the requirements? > > If a document is (partially) encrpyted, transcoded, and then > decrypted, the resulting octets must be the same as if the document > had been (partially) encrypted, decrypted, and then transcoded. I think this is one special case. It's easy to make it a bit more general by just saying: If a document is (partially) encrpyted, transcoded, and then decrypted, the resulting octets must be the same as if the document had been just transcoded (assuming the target character encoding is the same in both cases). I added the parenthesis to make sure it's the same transcoding, and to help people understand what we mean by transcoding here (some people use the term e.g. for HTML->WML conversion,...). > > >In the spec we say on > > >encryption, "obtain the octets by serializing the data in UTF-8 as > > >specified in [XML]" and on decryption if the data was an XML element or > > >content its UTF8, if not its just octets. BTW: Did you have a chance to > > >look at: > > > http://www.w3.org/Encryption/2001/Drafts/xmlenc-decrypt.html > > >There's a lot of XML/character processing involved there. > > > > I had a look at it. I haven't found anything about > > character encoding issues. Did I miss something? > >No. It's just slightly tricky on decryption with: serialize the XML, wrap >it in dummy nodes if not well formed, decrypt octets corresponding to the >encrypted data, parse the decrypted octet stream, remove the wrapping. Just >want to make sure we didn't miss anything! I'll have another look at it. > > - This does not yet say anything about NFC when something being > > encoded is serialized in UTF-8. In that case, it should say > > that NFC MUST be applied when this involves conversion from > > a legacy (i.e. non-Unicode) encoding. > >Ok, in 4.1 step 3.1 (Encrypt the Data) now says just that: > >If the data is an 'element' [XML, section 3] or element 'content' [XML, >section 3.1], obtain the octets by serializing the data in UTF-8 as >specified in [XML]. ([NFC] MUST be applied when this involves conversion >from a legacy (i.e. non-Unicode) encoding.) Serialization MAY be done by >the encryptor. If the encryptor does not serialize, then the application >MUST perform the serialization. This looks good to me. > > >I agree. Takeshi (and others), do we want to say, "If the document into > > >which the replacement is occurring is not UTF-8, the decryptor MUST > > >transcode the UTF-8 encoded characters into the target encoding." ? > > > > This would be nice. Please add this text. > >Ok, done. Thanks. > > > > - There needs to be some text about security risks associated > > > > with UTF-8. Assume that somebody knows that the encrypted > > > > text is Old Italic (http://www.unicode.org/charts/PDF/U10300.pdf, > > > > no spaces or punctuation). In this case, UTF-8 uses four bytes per > > > > characters, and three of them are always the same, and the top > > > > two (or three if there are no numbers) bits of the last byte > > > > are also always the same. > > I think that the Nonce can help quite a bit in some situations. > > But I'm not really sure at all that it will help much in the > > situation I have described. Let's assume the attacker knows > > that most of the encrypted text (rather than all) is in Old > > Italic. What you are saying is that if the non-Old Italic > > text is at the start of the data, attacks are much more > > difficult than if the non-Old Italic text is in the > > middle or at the end. This may indeed be true for attacks > > that are based on looking at the start of the encoded sequence. > > But there are most probably also attacks that can look at > > any part of the data and try to find out something about it. > > In other terms, the nonce doesn't really increase the entropy, > > it just conceals it. > > Of course, I'm not an expert here, but I'd rather be sure. > >Ok, I will defer this to the crypto experts. I'm looking forward to the discussion. > > >I'm not sure where we would do that. In xmldsig, we had a whole URI > > > section of the spec: > > > http://www.w3.org/TR/2001/PR-xmldsig-core-20010820/#sec-URI > > >so it make sense to specify this is completely as possible. However, > > > xenc makes mores casual use of URIs with a few "this is like xmldsig > > > Reference processing." So I can't find a place where those four > > > paragraphs on RFC2396+RFC2732+encoding_of_disallowed-characters, etc. > > > > Because you use XML Schema, and XML Schema now does this when > > you use the anyURI datatype, you don't have to include the > > actual text. > >Ok. > > > >As you say, we're > > >using anyURI, so do we still need this text in every spec that uses a > > > URI? > > > > No, actually not. But I guess it would be better to mention at > > least once that what you mean by URI throughout the spec is > > not exactly the same as the URI in RFC 2396. > > > > Actually, I just went through the whole spec looking for the > > string 'URI'. It turns up many times, but it's never defined. > > There is a reference with a label 'URI' (RFC 2396), but it's > > never actually referenced in the text. Also, RFC 2732 is never > > referenced. > >Ok. I've added to section 1.3, "Versions, Namespaces, URIs, and Identifiers" > > Finally, URIs [URI] MUST abide by the [XML-Schema] anyURI type > definition and the [XML-DSIG, 4.3.3.1 The URI Attribute] specification > (i.e., permitted characters, character escaping, scheme support, etc.). > >I'm not inclined to referenvce RFC2732; I'm not sure if its needed unless >we wanted to make sure all W3C specs should RFC2732 as MANDATORY? >Otherwise, I 'd prefer to keep it "orthogonal." Very good, dealing with a lot of issues in a very compact way! > > > > - In 2.2.1, 'media type URI' is mentioned, but there is neither > > > > an explanation nor a reference. In addition, it would be good > > > > to check/explain that this can include parameters (such as > > > > charset). > > > > > >Ok, I'll say, now, "Other alternatives include 'content' of an element, > > > or an external octet sequence that is identified by a media type URI > > > [IANA], such as the example in Encrypting Arbitrary Data and XML > > > Documents (section 2.1.4)." Presently, we have no provision for a > > > charset. (The IANA directory does not provide URIs for these > > > distinctions.) In xmldsig we have both a Type (to describe a higher > > > level aspect, likes its a particular XML structure) and > > > MIMEType/Encoding which I've tried to avoid here. However, if people > > > feel we should have both here as well we could revert to the xmldsig > > > approach or have some other way of describing the encoding -- are there > > > URIs for the types of encoding? > > > > I don't exactly understand why you need the type. If it's just > > an arbitrary octet stream, then it's just that, or isn't it. > >Presently, while we do make a distincition between and Element and its >Content we're not making a great deal of use of this from the processing >point of view. However, it is useufl in the prose, and I feel that this >Type will be of use in the future. > > > For the 'charset', IANA has a registry, too, but because registering > > a charset is a bit more lightweight than registering a mime type, > > there are no separate files for each charset. One possibility > > might be to add the parameters as URI parameters, i.e. > > http://www.isi.edu/in-notes/iana/assignments/media-types/text/plain;chars > >et= iso-8859-1 > >Ok, we'll take this under consideration as another option over creating a >Charset attribute. It looks indeed very convenient. But I'm not sure how exactly parameters are to be used. Anyway, a test with 5 different browsers showed that in all cases, the whole thing was sent to the server, resulting in a 404. This is probably good enough for our purposes. >Are there other parameters that we MUST account for >beyond the charset? Do we need some extensible mechanism? There are only few of them, and besides 'charset', they are very specific to particular types, as far as I know. > > Also, the use of IANA URIs for identifying the types is in some > > way definitely how things should be, but it's not really clearly > > specified, it just says 'do the same thing as in the example'. > >Also made more explicit in section 1.3: > > Finally, this specification identifies IANA registered media-types > via the set of URIs that match: > 'http://www.isi.edu/in-notes/iana/assignments/media-types/*/* . This also looks good. Regards, Martin.
Received on Wednesday, 9 January 2002 01:27:08 UTC