(unknown charset) Again, confusing 8.1 from (unknown charset) Jose Kahan on 2004-12-15 (www-xkms@w3.org from December 2004)

From: (unknown charset) Jose Kahan <jose.kahan@w3.org>
Date: Wed, 15 Dec 2004 19:18:54 +0100
To: (unknown charset) www-xkms@w3.org
Message-ID: <20041215181854.GF20230@inrialpes.fr>
Hi folks,

Per last meeting's action item:

  we will include a test case validating the string2key algorithm in section
  8.1,. AI: Guillermo and Jose to generate such test cases


After talking with Guillermo, we both found section 8.1 confusing. This section
uses terms that are known in the security field. What is confusing is how they
apply to XKMS. In particular:

----------
- Is this algorithm meant to generated a one-time use pass phrase that can be
  read over the phone?
  
- All shared string values are encoded as XML

  What is a shared string here? Is it a limited-use string? [242] proposes a
  a user-generated authentication phrase for revoking a public 
  key: "Help I have revealed my key". However, when looking at section C.2.1,
  we find that the 8.1 algorithm was used to convert it "helpih...". 

  If this phrase was a shared string, shouldn't it have been converted into
  XML, regardless of its content, and then the result converted into hexa,
  without dropping spaces, punctuation, etc.?

  What is the meaning of "encoded as XML"? Accentuated characters and
  "&'<> symbols encoded as entities (we would not be able to precise the
  charset otherwise). Accentuated characters encoded as UTF-8?
 
  I couldn't find what the spec defines as "shared string" or why 8.1 has to be
  applied always, regardless of who generated the shared secret.

- All punctuation, space and control characters are removed.

  I can understand why we remove control characters, but I can't understand why
  we remove punctuation, spaces. We can read them on the phone easily, I think.

  Moreover, by simplifying thus the pass phrase, aren't we making it more
  vulnerable to oracle attacks?
  
- All upper case characters in the Latin-1 alphabet (A-Z) are converted to lower case.
  No other characters, including accented characters are converted

  Why must uppercase be converted into lowercase? One can read them easily on
  the phone I think :)
  It's not clear what is done to the other characters or what was the
  rationale. From reading this, It seems that if my name is spelled JosÉ, it
  would be converted to josÉ.
  
  This convertion also reduces the keyspace, and imo makes it more vulnerable
  to oracle attacks.
--------

IMO, what we need to define is:

- What is a limited-used shared secret
- When does the 8.1 algorithm need to be applied (make it an explicit reference
  in concerned sections)
- Decide if all such secrets should be speakable on the phone or be typed with 
  a device that doesn't allow all those characters; use it as
  a rationale for removing punctuation, etc.
- Remove the ambiguities of the algorithm in section 8.1.
- Decide if we need to define a minimum size for the shared secret string. What
  is its relationship with entropy?

In my opinion, what we are looking for is for an algorithm to canonicalize
shared-secret strings (that they be limited or not) that produces an XML valid
string. I would propose the following one:

1. Remove all the control characters from the string
   --> reason: I feel that those characters could cause problems and they could
       not be typed al the time
2. Encode the string in UTF-8
   This will take into account accentuated characters
   --> reason: it's the only way to convert those characters into portable ASCII
3. Put the Hexa equvalent for each of those characters, using lowercase
   letters. Note that here we don't remove any punctuation symbols. We are just
   converting them.
  
This would convert Jos&É into [4a] [6f] [73] [26] [c3] [89]

XKMS could be used by mobile devices too. If for some reason, we believe that
it will be too much of overhead to make UTF-8 convertions, we can just suppress
all the characters above 127 ASCII. Another reason could be if the user has to
type those characters in a phone and he doesn't have the full character set
available. It could be that an operator at the other end cannot read a decoded
UTF-8 string if it's stored as such. This is some rationale as to why reduce
the strings to ASCII 32-127.

I don't know what would be the rationale for converting the strings to
lower-case and suppressing all the punctuation symbols.

Tommy had written:

>Four implementors have independantly implemented the "Limited Use
>Shared Secret" algorithm in a way that interoperates so I have not
>seen a break down yet.  However, both the spec and the existing shared
>secret distribution points (at least my service) avoid cases that lead
>to ambigous interpretation.

This seems to imply that the places where the 8.1 are already identified. I
think it would be good to make this explicit in the spec.

>> Maybe change the spec to only allow a smaller subset of
>> strings to become keys

>I'm in favor of this option, provided that the recommendations in
>Section 10.4 can still be followed.

Ditto :)

-jose
Received on Wednesday, 15 December 2004 18:19:28 UTC