Re: Again, confusing 8.1 from Stephen Farrell on 2004-12-17 (www-xkms@w3.org from December 2004)

From: Stephen Farrell <stephen.farrell@cs.tcd.ie>
Date: Fri, 17 Dec 2004 12:48:29 +0000
To: jose.kahan@w3.org
Cc: www-xkms@w3.org
Message-ID: <41C2D59D.1080400@cs.tcd.ie>
Jose,

I agree that this bit needs more work. A few points.

- Do we want to maintain interop with any existing implementations?
   I believe we do. But I think its fair to assume that most
   existing code hasn't taken the corner cases into account so we
   should be ok to make non-interoperable changes for corner cases.
- Reducing the keyspace isn't a real issue. English has something
   like 1.5 bits of entropy per character, so unless you're using
   really long strings it makes no difference - the space is
   searchable anyway.
- Case folding (H->h) is IMO worthwhile simply to avoid the
   CAPSLOCK problem.
- Some whitespace shrinkage is needed, e.g. "^t" vs "    ".
   Most other specs shrink all consequtive whitspace characters
   to one space, we currently eat 'em all which is a bit weird
   but ok. (If Phill's listening maybe he had a reason for that?)
- Punctuation character handling. Current spec is weird there.
   They'd normally be included in the output.
- We do have to determine how to handle XML encoding, e.g. of "&",
   "%20", "<" etc. I've no clue how to properly do that.
- We do have to determine how to define and handle control chars.
   The latter is easy, the former I dunno how to do.
- I don't think mobile devices etc is a real issue for us, since
   input device limitations can be taken into account when the
   strings are selected/generated.
- We have to decide how to handle I18N. I think the current spec
   is probably broken for countries which don't use Latin-1
   characters at all. Again I'm not sure of the right thing here,
   but there has been some (non-trivial) work done on this for
   DNS [1], which is being taken up in various security related
   specs, (and for which there's source code available) so maybe
   using that is a good idea.

Stephen.

[1] http://www.ietf.org/internet-drafts/draft-hoffman-rfc3454bis-02.txt

Jose Kahan wrote:
> Hi folks,
> 
> Per last meeting's action item:
> 
>   we will include a test case validating the string2key algorithm in section
>   8.1,. AI: Guillermo and Jose to generate such test cases
> 
> 
> After talking with Guillermo, we both found section 8.1 confusing. This section
> uses terms that are known in the security field. What is confusing is how they
> apply to XKMS. In particular:
> 
> ----------
> - Is this algorithm meant to generated a one-time use pass phrase that can be
>   read over the phone?
>   
> - All shared string values are encoded as XML
> 
>   What is a shared string here? Is it a limited-use string? [242] proposes a
>   a user-generated authentication phrase for revoking a public 
>   key: "Help I have revealed my key". However, when looking at section C.2.1,
>   we find that the 8.1 algorithm was used to convert it "helpih...". 
> 
>   If this phrase was a shared string, shouldn't it have been converted into
>   XML, regardless of its content, and then the result converted into hexa,
>   without dropping spaces, punctuation, etc.?
> 
>   What is the meaning of "encoded as XML"? Accentuated characters and
>   "&'<> symbols encoded as entities (we would not be able to precise the
>   charset otherwise). Accentuated characters encoded as UTF-8?
>  
>   I couldn't find what the spec defines as "shared string" or why 8.1 has to be
>   applied always, regardless of who generated the shared secret.
> 
> - All punctuation, space and control characters are removed.
> 
>   I can understand why we remove control characters, but I can't understand why
>   we remove punctuation, spaces. We can read them on the phone easily, I think.
> 
>   Moreover, by simplifying thus the pass phrase, aren't we making it more
>   vulnerable to oracle attacks?
>   
> - All upper case characters in the Latin-1 alphabet (A-Z) are converted to lower case.
>   No other characters, including accented characters are converted
> 
>   Why must uppercase be converted into lowercase? One can read them easily on
>   the phone I think�:)
>   It's not clear what is done to the other characters or what was the
>   rationale. From reading this, It seems that if my name is spelled Jos�, it
>   would be converted to jos�.
>   
>   This convertion also reduces the keyspace, and imo makes it more vulnerable
>   to oracle attacks.
> --------
> 
> IMO, what we need to define is:
> 
> - What is a limited-used shared secret
> - When does the 8.1 algorithm need to be applied (make it an explicit reference
>   in concerned sections)
> - Decide if all such secrets should be speakable on the phone or be typed with 
>   a device that doesn't allow all those characters; use it as
>   a rationale for removing punctuation, etc.
> - Remove the ambiguities of the algorithm in section 8.1.
> - Decide if we need to define a minimum size for the shared secret string. What
>   is its relationship with entropy?
> 
> In my opinion, what we are looking for is for an algorithm to canonicalize
> shared-secret strings (that they be limited or not) that produces an XML valid
> string. I would propose the following one:
> 
> 1. Remove all the control characters from the string
>    --> reason: I feel that those characters could cause problems and they could
>        not be typed al the time
> 2. Encode the string in UTF-8
>    This will take into account accentuated characters
>    --> reason: it's the only way to convert those characters into portable ASCII
> 3. Put the Hexa equvalent for each of those characters, using lowercase
>    letters. Note that here we don't remove any punctuation symbols. We are just
>    converting them.
>   
> This would convert Jos&� into [4a] [6f] [73] [26] [c3] [89]
> 
> XKMS could be used by mobile devices too. If for some reason, we believe that
> it will be too much of overhead to make UTF-8 convertions, we can just suppress
> all the characters above 127 ASCII. Another reason could be if the user has to
> type those characters in a phone and he doesn't have the full character set
> available. It could be that an operator at the other end cannot read a decoded
> UTF-8 string if it's stored as such. This is some rationale as to why reduce
> the strings to ASCII 32-127.
> 
> I don't know what would be the rationale for converting the strings to
> lower-case and suppressing all the punctuation symbols.
> 
> Tommy had written:
> 
> 
>>Four implementors have independantly implemented the "Limited Use
>>Shared Secret" algorithm in a way that interoperates so I have not
>>seen a break down yet.  However, both the spec and the existing shared
>>secret distribution points (at least my service) avoid cases that lead
>>to ambigous interpretation.
> 
> 
> This seems to imply that the places where the 8.1 are already identified. I
> think it would be good to make this explicit in the spec.
> 
> 
>>>Maybe change the spec to only allow a smaller subset of
>>>strings to become keys
> 
> 
>>I'm in favor of this option, provided that the recommendations in
>>Section 10.4 can still be followed.
> 
> 
> Ditto�:)
> 
> -jose
Received on Friday, 17 December 2004 12:43:49 UTC