Re: Again, confusing 8.1 from Stephen Farrell on 2004-12-17 (www-xkms@w3.org from December 2004)

From: Stephen Farrell <stephen.farrell@cs.tcd.ie>
Date: Fri, 17 Dec 2004 16:00:47 +0000
To: Ed Simon <edsimon@xmlsec.com>
Cc: www-xkms@w3.org
Message-ID: <41C302AF.8070701@cs.tcd.ie>
Hi Ed,

I'd agree with both of those, except that both would break
interop and its getting late in the day to do that...

Stephen.

Ed Simon wrote:

> As the use of XML-sensitive characters is a problem, then can we not, and
> should we not anyway, require that pass phrases be base64-encoded when used
> within XML.  In fact, it would seem to be that this would be good practice
> so the pass phrase does not get messed up by XML processing whether it
> contains XML-ese or not.
> 
> BTW, I also think trailing and leading whitespace MUST be removed and
> internal whitespace reduced to one space character (not zero).
> 
> Ed
> ========================================
> Ed Simon
> (613) 726-9645
> edsimon@xmlsec.com 
> Interested in XML, Web Services, or Security?  Visit "www.xmlsec.com".
> Now available!  "Web Services Security" published by Osborne (ISBN#
> 0072224711)
> 
> 
> -----Original Message-----
> From: Stephen Farrell [mailto:stephen.farrell@cs.tcd.ie] 
> Sent: December 17, 2004 10:42 AM
> To: Ed Simon
> Cc: www-xkms@w3.org
> Subject: Re: Again, confusing 8.1
> 
> 
> Hi Ed,
> 
> I guess we're all in agreement, but none of us seems to know exactly how to
> write down what we want!
> 
> I agree that we have to support non Latin characters - if we didn't then
> nearly everyone in certain parts of the world would use the same key (since
> their entire string would be highly likely to be reduced to nothing!)
> which'd be a bit of a security flaw as well as an I18N-nasty.
> 
> If we drop xml encoding (good, let's remove that degree of freedom), we're
> then ok to directly use "<" characters in our strings?
> 
> As for the canonical UTF-8, that's what the stringprep RFC does, and
> apparently it involves ~30Kb of object code (from memory, so may be wrong
> there), so it has been found to be complicated.
> 
> Stephen.
> 
> Ed Simon wrote:
> 
> 
>>I generally agree with Jose and Guillermo's recommendations EXCEPT for 
>>the one about filtering UTF-8 characters outside the ASCII32-127.  
>>Unless, there is a verifiable case to be made for disallowing non-Latin
> 
> characters (eg.
> 
>>Korean pass phrases) I would not include that possibility.  
>>Ultimately, the pass phrase is just '1's and '0's and all we are doing 
>>is saying how a human-readable/writable phrase can be consistently 
>>converted into binary; that MAY not always mean the end device has to 
>>understand Unicode, just binary.  (I say MAY because I'm not a mobile 
>>device expert, I just want someone who is to say non-ASCII is a 
>>problem before we try to accommodate
>>it.)
>>
>>I would drop mention of "XML Encoding" and call it "UTF-8" encoding; 
>>not only do I think this is sensible from the outset but it also gets 
>>rid of trying to process XMLese like entities etc.  I confess that I 
>>have one question which is I am not absolutely sure (eg. due to 
>>combining sequences) there is always one and only one binary 
>>representation for every unique UTF-8-encoded pass phrase; Jose, can 
>>you verify that with a W3C UTF-8 expert.  A follow-up question would 
>>be whether we could use rules to canonicalise the UTF-8 (eg. do not 
>>use combining characters) if there is more than one binary representation.
>>
>>Regards, Ed
>>========================================
>>Ed Simon
>>(613) 726-9645
>>edsimon@xmlsec.com
>>Interested in XML, Web Services, or Security?  Visit "www.xmlsec.com".
>>Now available!  "Web Services Security" published by Osborne (ISBN#
>>0072224711)
>>
>>
>>-----Original Message-----
>>From: www-xkms-request@w3.org [mailto:www-xkms-request@w3.org] On 
>>Behalf Of Stephen Farrell
>>Sent: December 17, 2004 7:48 AM
>>To: jose.kahan@w3.org
>>Cc: www-xkms@w3.org
>>Subject: Re: Again, confusing 8.1
>>
>>
>>
>>Jose,
>>
>>I agree that this bit needs more work. A few points.
>>
>>- Do we want to maintain interop with any existing implementations?
>>   I believe we do. But I think its fair to assume that most
>>   existing code hasn't taken the corner cases into account so we
>>   should be ok to make non-interoperable changes for corner cases.
>>- Reducing the keyspace isn't a real issue. English has something
>>   like 1.5 bits of entropy per character, so unless you're using
>>   really long strings it makes no difference - the space is
>>   searchable anyway.
>>- Case folding (H->h) is IMO worthwhile simply to avoid the
>>   CAPSLOCK problem.
>>- Some whitespace shrinkage is needed, e.g. "^t" vs "    ".
>>   Most other specs shrink all consequtive whitspace characters
>>   to one space, we currently eat 'em all which is a bit weird
>>   but ok. (If Phill's listening maybe he had a reason for that?)
>>- Punctuation character handling. Current spec is weird there.
>>   They'd normally be included in the output.
>>- We do have to determine how to handle XML encoding, e.g. of "&",
>>   "%20", "<" etc. I've no clue how to properly do that.
>>- We do have to determine how to define and handle control chars.
>>   The latter is easy, the former I dunno how to do.
>>- I don't think mobile devices etc is a real issue for us, since
>>   input device limitations can be taken into account when the
>>   strings are selected/generated.
>>- We have to decide how to handle I18N. I think the current spec
>>   is probably broken for countries which don't use Latin-1
>>   characters at all. Again I'm not sure of the right thing here,
>>   but there has been some (non-trivial) work done on this for
>>   DNS [1], which is being taken up in various security related
>>   specs, (and for which there's source code available) so maybe
>>   using that is a good idea.
>>
>>Stephen.
>>
>>[1] 
>>http://www.ietf.org/internet-drafts/draft-hoffman-rfc3454bis-02.txt
>>
>>Jose Kahan wrote:
>>
>>
>>>Hi folks,
>>>
>>>Per last meeting's action item:
>>>
>>> we will include a test case validating the string2key algorithm in
>>
>>section
>>
>>
>>> 8.1,. AI: Guillermo and Jose to generate such test cases
>>>
>>>
>>>After talking with Guillermo, we both found section 8.1 confusing. 
>>>This section uses terms that are known in the security field. What is 
>>>confusing is how they apply to XKMS. In particular:
>>>
>>>----------
>>>- Is this algorithm meant to generated a one-time use pass phrase that 
>>>can
>>
>>be
>>
>>
>>> read over the phone?
>>> 
>>>- All shared string values are encoded as XML
>>>
>>> What is a shared string here? Is it a limited-use string? [242] 
>>>proposes
>>
>>a
>>
>>
>>> a user-generated authentication phrase for revoking a public
>>> key: "Help I have revealed my key". However, when looking at section
>>
>>C.2.1,
>>
>>
>>> we find that the 8.1 algorithm was used to convert it "helpih...". 
>>>
>>> If this phrase was a shared string, shouldn't it have been converted
>>
>>into
>>
>>
>>> XML, regardless of its content, and then the result converted into 
>>>hexa,  without dropping spaces, punctuation, etc.?
>>>
>>> What is the meaning of "encoded as XML"? Accentuated characters and  
>>>"&'<> symbols encoded as entities (we would not be able to precise 
>>>the  charset otherwise). Accentuated characters encoded as UTF-8?
>>>
>>> I couldn't find what the spec defines as "shared string" or why 8.1 
>>>has
>>
>>to be
>>
>>
>>> applied always, regardless of who generated the shared secret.
>>>
>>>- All punctuation, space and control characters are removed.
>>>
>>> I can understand why we remove control characters, but I can't
>>
>>understand why
>>
>>
>>> we remove punctuation, spaces. We can read them on the phone easily, 
>>>I
>>
>>think.
>>
>>
>>> Moreover, by simplifying thus the pass phrase, aren't we making it 
>>>more  vulnerable to oracle attacks?
>>> 
>>>- All upper case characters in the Latin-1 alphabet (A-Z) are 
>>>converted to
>>
>>lower case.
>>
>>
>>> No other characters, including accented characters are converted
>>>
>>> Why must uppercase be converted into lowercase? One can read them 
>>>easily
>>
>>on
>>
>>
>>> the phone I think :)
>>> It's not clear what is done to the other characters or what was the  
>>>rationale. From reading this, It seems that if my name is spelled Jos 
>>>,
>>
>>it
>>
>>
>>> would be converted to jos .
>>> 
>>> This convertion also reduces the keyspace, and imo makes it more
>>
>>vulnerable
>>
>>
>>> to oracle attacks.
>>>--------
>>>
>>>IMO, what we need to define is:
>>>
>>>- What is a limited-used shared secret
>>>- When does the 8.1 algorithm need to be applied (make it an explicit
>>
>>reference
>>
>>
>>> in concerned sections)
>>>- Decide if all such secrets should be speakable on the phone or be 
>>>typed
>>
>>with
>>
>>
>>> a device that doesn't allow all those characters; use it as
>>> a rationale for removing punctuation, etc.
>>>- Remove the ambiguities of the algorithm in section 8.1.
>>>- Decide if we need to define a minimum size for the shared secret string.
>>
>>What
>>
>>
>>> is its relationship with entropy?
>>>
>>>In my opinion, what we are looking for is for an algorithm to 
>>>canonicalize shared-secret strings (that they be limited or not) that 
>>>produces an XML valid string. I would propose the following one:
>>>
>>>1. Remove all the control characters from the string
>>>  --> reason: I feel that those characters could cause problems and 
>>>they
>>
>>could
>>
>>
>>>      not be typed al the time
>>>2. Encode the string in UTF-8
>>>  This will take into account accentuated characters
>>>  --> reason: it's the only way to convert those characters into 
>>>portable ASCII 3. Put the Hexa equvalent for each of those characters,
>>
>>using lowercase
>>
>>
>>>  letters. Note that here we don't remove any punctuation symbols. We 
>>>are
>>
>>just
>>
>>
>>>  converting them.
>>> 
>>>This would convert Jos&  into [4a] [6f] [73] [26] [c3] [89]
>>>
>>>XKMS could be used by mobile devices too. If for some reason, we 
>>>believe that it will be too much of overhead to make UTF-8 
>>>convertions, we can just suppress all the characters above 127 ASCII.
>>>Another reason could be if the user has to type those characters in a 
>>>phone and he doesn't have the full character set available. It could 
>>>be that an operator at the other end cannot read a decoded
>>>UTF-8 string if it's stored as such. This is some rationale as to why 
>>>reduce the strings to ASCII 32-127.
>>>
>>>I don't know what would be the rationale for converting the strings to 
>>>lower-case and suppressing all the punctuation symbols.
>>>
>>>Tommy had written:
>>>
>>>
>>>
>>>
>>>>Four implementors have independantly implemented the "Limited Use 
>>>>Shared Secret" algorithm in a way that interoperates so I have not 
>>>>seen a break down yet.  However, both the spec and the existing 
>>>>shared secret distribution points (at least my service) avoid cases 
>>>>that lead to ambigous interpretation.
>>>
>>>
>>>This seems to imply that the places where the 8.1 are already 
>>>identified. I think it would be good to make this explicit in the spec.
>>>
>>>
>>>
>>>
>>>>>Maybe change the spec to only allow a smaller subset of strings to 
>>>>>become keys
>>>
>>>
>>>>I'm in favor of this option, provided that the recommendations in 
>>>>Section 10.4 can still be followed.
>>>
>>>
>>>Ditto :)
>>>
>>>-jose
>>
>>
>>
>>
> 
> 
>
Received on Friday, 17 December 2004 15:56:07 UTC