- From: Stephen Farrell <stephen.farrell@cs.tcd.ie>
- Date: Fri, 17 Dec 2004 15:41:56 +0000
- To: Ed Simon <edsimon@xmlsec.com>
- Cc: www-xkms@w3.org
Hi Ed, I guess we're all in agreement, but none of us seems to know exactly how to write down what we want! I agree that we have to support non Latin characters - if we didn't then nearly everyone in certain parts of the world would use the same key (since their entire string would be highly likely to be reduced to nothing!) which'd be a bit of a security flaw as well as an I18N-nasty. If we drop xml encoding (good, let's remove that degree of freedom), we're then ok to directly use "<" characters in our strings? As for the canonical UTF-8, that's what the stringprep RFC does, and apparently it involves ~30Kb of object code (from memory, so may be wrong there), so it has been found to be complicated. Stephen. Ed Simon wrote: > I generally agree with Jose and Guillermo's recommendations EXCEPT for the > one about filtering UTF-8 characters outside the ASCII32-127. Unless, there > is a verifiable case to be made for disallowing non-Latin characters (eg. > Korean pass phrases) I would not include that possibility. Ultimately, the > pass phrase is just '1's and '0's and all we are doing is saying how a > human-readable/writable phrase can be consistently converted into binary; > that MAY not always mean the end device has to understand Unicode, just > binary. (I say MAY because I'm not a mobile device expert, I just want > someone who is to say non-ASCII is a problem before we try to accommodate > it.) > > I would drop mention of "XML Encoding" and call it "UTF-8" encoding; not > only do I think this is sensible from the outset but it also gets rid of > trying to process XMLese like entities etc. I confess that I have one > question which is I am not absolutely sure (eg. due to combining sequences) > there is always one and only one binary representation for every unique > UTF-8-encoded pass phrase; Jose, can you verify that with a W3C UTF-8 > expert. A follow-up question would be whether we could use rules to > canonicalise the UTF-8 (eg. do not use combining characters) if there is > more than one binary representation. > > Regards, Ed > ======================================== > Ed Simon > (613) 726-9645 > edsimon@xmlsec.com > Interested in XML, Web Services, or Security? Visit "www.xmlsec.com". > Now available! "Web Services Security" published by Osborne (ISBN# > 0072224711) > > > -----Original Message----- > From: www-xkms-request@w3.org [mailto:www-xkms-request@w3.org] On Behalf Of > Stephen Farrell > Sent: December 17, 2004 7:48 AM > To: jose.kahan@w3.org > Cc: www-xkms@w3.org > Subject: Re: Again, confusing 8.1 > > > > Jose, > > I agree that this bit needs more work. A few points. > > - Do we want to maintain interop with any existing implementations? > I believe we do. But I think its fair to assume that most > existing code hasn't taken the corner cases into account so we > should be ok to make non-interoperable changes for corner cases. > - Reducing the keyspace isn't a real issue. English has something > like 1.5 bits of entropy per character, so unless you're using > really long strings it makes no difference - the space is > searchable anyway. > - Case folding (H->h) is IMO worthwhile simply to avoid the > CAPSLOCK problem. > - Some whitespace shrinkage is needed, e.g. "^t" vs " ". > Most other specs shrink all consequtive whitspace characters > to one space, we currently eat 'em all which is a bit weird > but ok. (If Phill's listening maybe he had a reason for that?) > - Punctuation character handling. Current spec is weird there. > They'd normally be included in the output. > - We do have to determine how to handle XML encoding, e.g. of "&", > "%20", "<" etc. I've no clue how to properly do that. > - We do have to determine how to define and handle control chars. > The latter is easy, the former I dunno how to do. > - I don't think mobile devices etc is a real issue for us, since > input device limitations can be taken into account when the > strings are selected/generated. > - We have to decide how to handle I18N. I think the current spec > is probably broken for countries which don't use Latin-1 > characters at all. Again I'm not sure of the right thing here, > but there has been some (non-trivial) work done on this for > DNS [1], which is being taken up in various security related > specs, (and for which there's source code available) so maybe > using that is a good idea. > > Stephen. > > [1] http://www.ietf.org/internet-drafts/draft-hoffman-rfc3454bis-02.txt > > Jose Kahan wrote: > >>Hi folks, >> >>Per last meeting's action item: >> >> we will include a test case validating the string2key algorithm in > > section > >> 8.1,. AI: Guillermo and Jose to generate such test cases >> >> >>After talking with Guillermo, we both found section 8.1 confusing. >>This section uses terms that are known in the security field. What is >>confusing is how they apply to XKMS. In particular: >> >>---------- >>- Is this algorithm meant to generated a one-time use pass phrase that can > > be > >> read over the phone? >> >>- All shared string values are encoded as XML >> >> What is a shared string here? Is it a limited-use string? [242] proposes > > a > >> a user-generated authentication phrase for revoking a public >> key: "Help I have revealed my key". However, when looking at section > > C.2.1, > >> we find that the 8.1 algorithm was used to convert it "helpih...". >> >> If this phrase was a shared string, shouldn't it have been converted > > into > >> XML, regardless of its content, and then the result converted into hexa, >> without dropping spaces, punctuation, etc.? >> >> What is the meaning of "encoded as XML"? Accentuated characters and >> "&'<> symbols encoded as entities (we would not be able to precise the >> charset otherwise). Accentuated characters encoded as UTF-8? >> >> I couldn't find what the spec defines as "shared string" or why 8.1 has > > to be > >> applied always, regardless of who generated the shared secret. >> >>- All punctuation, space and control characters are removed. >> >> I can understand why we remove control characters, but I can't > > understand why > >> we remove punctuation, spaces. We can read them on the phone easily, I > > think. > >> Moreover, by simplifying thus the pass phrase, aren't we making it more >> vulnerable to oracle attacks? >> >>- All upper case characters in the Latin-1 alphabet (A-Z) are converted to > > lower case. > >> No other characters, including accented characters are converted >> >> Why must uppercase be converted into lowercase? One can read them easily > > on > >> the phone I think :) >> It's not clear what is done to the other characters or what was the >> rationale. From reading this, It seems that if my name is spelled Jos , > > it > >> would be converted to jos . >> >> This convertion also reduces the keyspace, and imo makes it more > > vulnerable > >> to oracle attacks. >>-------- >> >>IMO, what we need to define is: >> >>- What is a limited-used shared secret >>- When does the 8.1 algorithm need to be applied (make it an explicit > > reference > >> in concerned sections) >>- Decide if all such secrets should be speakable on the phone or be typed > > with > >> a device that doesn't allow all those characters; use it as >> a rationale for removing punctuation, etc. >>- Remove the ambiguities of the algorithm in section 8.1. >>- Decide if we need to define a minimum size for the shared secret string. > > What > >> is its relationship with entropy? >> >>In my opinion, what we are looking for is for an algorithm to >>canonicalize shared-secret strings (that they be limited or not) that >>produces an XML valid string. I would propose the following one: >> >>1. Remove all the control characters from the string >> --> reason: I feel that those characters could cause problems and they > > could > >> not be typed al the time >>2. Encode the string in UTF-8 >> This will take into account accentuated characters >> --> reason: it's the only way to convert those characters into >>portable ASCII 3. Put the Hexa equvalent for each of those characters, > > using lowercase > >> letters. Note that here we don't remove any punctuation symbols. We are > > just > >> converting them. >> >>This would convert Jos& into [4a] [6f] [73] [26] [c3] [89] >> >>XKMS could be used by mobile devices too. If for some reason, we >>believe that it will be too much of overhead to make UTF-8 >>convertions, we can just suppress all the characters above 127 ASCII. >>Another reason could be if the user has to type those characters in a >>phone and he doesn't have the full character set available. It could >>be that an operator at the other end cannot read a decoded >>UTF-8 string if it's stored as such. This is some rationale as to why >>reduce the strings to ASCII 32-127. >> >>I don't know what would be the rationale for converting the strings to >>lower-case and suppressing all the punctuation symbols. >> >>Tommy had written: >> >> >> >>>Four implementors have independantly implemented the "Limited Use >>>Shared Secret" algorithm in a way that interoperates so I have not >>>seen a break down yet. However, both the spec and the existing shared >>>secret distribution points (at least my service) avoid cases that lead >>>to ambigous interpretation. >> >> >>This seems to imply that the places where the 8.1 are already >>identified. I think it would be good to make this explicit in the spec. >> >> >> >>>>Maybe change the spec to only allow a smaller subset of strings to >>>>become keys >> >> >>>I'm in favor of this option, provided that the recommendations in >>>Section 10.4 can still be followed. >> >> >>Ditto :) >> >>-jose > > > >
Received on Friday, 17 December 2004 15:37:16 UTC