Re: Again, confusing 8.1 from Stephen Farrell on 2004-12-17 (www-xkms@w3.org from December 2004)

From: Stephen Farrell <stephen.farrell@cs.tcd.ie>
Date: Fri, 17 Dec 2004 15:41:56 +0000
To: Ed Simon <edsimon@xmlsec.com>
Cc: www-xkms@w3.org
Message-ID: <41C2FE44.8020500@cs.tcd.ie>
Hi Ed,

I guess we're all in agreement, but none of us seems to know
exactly how to write down what we want!

I agree that we have to support non Latin characters - if we
didn't then nearly everyone in certain parts of the world would
use the same key (since their entire string would be highly
likely to be reduced to nothing!) which'd be a bit of a security
flaw as well as an I18N-nasty.

If we drop xml encoding (good, let's remove that degree
of freedom), we're then ok to directly use "<" characters
in our strings?

As for the canonical UTF-8, that's what the stringprep RFC
does, and apparently it involves ~30Kb of object code (from
memory, so may be wrong there), so it has been found to be
complicated.

Stephen.

Ed Simon wrote:

> I generally agree with Jose and Guillermo's recommendations EXCEPT for the
> one about filtering UTF-8 characters outside the ASCII32-127.  Unless, there
> is a verifiable case to be made for disallowing non-Latin characters (eg.
> Korean pass phrases) I would not include that possibility.  Ultimately, the
> pass phrase is just '1's and '0's and all we are doing is saying how a
> human-readable/writable phrase can be consistently converted into binary;
> that MAY not always mean the end device has to understand Unicode, just
> binary.  (I say MAY because I'm not a mobile device expert, I just want
> someone who is to say non-ASCII is a problem before we try to accommodate
> it.)
> 
> I would drop mention of "XML Encoding" and call it "UTF-8" encoding; not
> only do I think this is sensible from the outset but it also gets rid of
> trying to process XMLese like entities etc.  I confess that I have one
> question which is I am not absolutely sure (eg. due to combining sequences)
> there is always one and only one binary representation for every unique
> UTF-8-encoded pass phrase; Jose, can you verify that with a W3C UTF-8
> expert.  A follow-up question would be whether we could use rules to
> canonicalise the UTF-8 (eg. do not use combining characters) if there is
> more than one binary representation.
> 
> Regards, Ed
> ========================================
> Ed Simon
> (613) 726-9645
> edsimon@xmlsec.com 
> Interested in XML, Web Services, or Security?  Visit "www.xmlsec.com".
> Now available!  "Web Services Security" published by Osborne (ISBN#
> 0072224711)
> 
> 
> -----Original Message-----
> From: www-xkms-request@w3.org [mailto:www-xkms-request@w3.org] On Behalf Of
> Stephen Farrell
> Sent: December 17, 2004 7:48 AM
> To: jose.kahan@w3.org
> Cc: www-xkms@w3.org
> Subject: Re: Again, confusing 8.1
> 
> 
> 
> Jose,
> 
> I agree that this bit needs more work. A few points.
> 
> - Do we want to maintain interop with any existing implementations?
>    I believe we do. But I think its fair to assume that most
>    existing code hasn't taken the corner cases into account so we
>    should be ok to make non-interoperable changes for corner cases.
> - Reducing the keyspace isn't a real issue. English has something
>    like 1.5 bits of entropy per character, so unless you're using
>    really long strings it makes no difference - the space is
>    searchable anyway.
> - Case folding (H->h) is IMO worthwhile simply to avoid the
>    CAPSLOCK problem.
> - Some whitespace shrinkage is needed, e.g. "^t" vs "    ".
>    Most other specs shrink all consequtive whitspace characters
>    to one space, we currently eat 'em all which is a bit weird
>    but ok. (If Phill's listening maybe he had a reason for that?)
> - Punctuation character handling. Current spec is weird there.
>    They'd normally be included in the output.
> - We do have to determine how to handle XML encoding, e.g. of "&",
>    "%20", "<" etc. I've no clue how to properly do that.
> - We do have to determine how to define and handle control chars.
>    The latter is easy, the former I dunno how to do.
> - I don't think mobile devices etc is a real issue for us, since
>    input device limitations can be taken into account when the
>    strings are selected/generated.
> - We have to decide how to handle I18N. I think the current spec
>    is probably broken for countries which don't use Latin-1
>    characters at all. Again I'm not sure of the right thing here,
>    but there has been some (non-trivial) work done on this for
>    DNS [1], which is being taken up in various security related
>    specs, (and for which there's source code available) so maybe
>    using that is a good idea.
> 
> Stephen.
> 
> [1] http://www.ietf.org/internet-drafts/draft-hoffman-rfc3454bis-02.txt
> 
> Jose Kahan wrote:
> 
>>Hi folks,
>>
>>Per last meeting's action item:
>>
>>  we will include a test case validating the string2key algorithm in
> 
> section
> 
>>  8.1,. AI: Guillermo and Jose to generate such test cases
>>
>>
>>After talking with Guillermo, we both found section 8.1 confusing. 
>>This section uses terms that are known in the security field. What is 
>>confusing is how they apply to XKMS. In particular:
>>
>>----------
>>- Is this algorithm meant to generated a one-time use pass phrase that can
> 
> be
> 
>>  read over the phone?
>>  
>>- All shared string values are encoded as XML
>>
>>  What is a shared string here? Is it a limited-use string? [242] proposes
> 
> a
> 
>>  a user-generated authentication phrase for revoking a public 
>>  key: "Help I have revealed my key". However, when looking at section
> 
> C.2.1,
> 
>>  we find that the 8.1 algorithm was used to convert it "helpih...". 
>>
>>  If this phrase was a shared string, shouldn't it have been converted
> 
> into
> 
>>  XML, regardless of its content, and then the result converted into hexa,
>>  without dropping spaces, punctuation, etc.?
>>
>>  What is the meaning of "encoded as XML"? Accentuated characters and
>>  "&'<> symbols encoded as entities (we would not be able to precise the
>>  charset otherwise). Accentuated characters encoded as UTF-8?
>> 
>>  I couldn't find what the spec defines as "shared string" or why 8.1 has
> 
> to be
> 
>>  applied always, regardless of who generated the shared secret.
>>
>>- All punctuation, space and control characters are removed.
>>
>>  I can understand why we remove control characters, but I can't
> 
> understand why
> 
>>  we remove punctuation, spaces. We can read them on the phone easily, I
> 
> think.
> 
>>  Moreover, by simplifying thus the pass phrase, aren't we making it more
>>  vulnerable to oracle attacks?
>>  
>>- All upper case characters in the Latin-1 alphabet (A-Z) are converted to
> 
> lower case.
> 
>>  No other characters, including accented characters are converted
>>
>>  Why must uppercase be converted into lowercase? One can read them easily
> 
> on
> 
>>  the phone I think :)
>>  It's not clear what is done to the other characters or what was the
>>  rationale. From reading this, It seems that if my name is spelled Jos ,
> 
> it
> 
>>  would be converted to jos .
>>  
>>  This convertion also reduces the keyspace, and imo makes it more
> 
> vulnerable
> 
>>  to oracle attacks.
>>--------
>>
>>IMO, what we need to define is:
>>
>>- What is a limited-used shared secret
>>- When does the 8.1 algorithm need to be applied (make it an explicit
> 
> reference
> 
>>  in concerned sections)
>>- Decide if all such secrets should be speakable on the phone or be typed
> 
> with 
> 
>>  a device that doesn't allow all those characters; use it as
>>  a rationale for removing punctuation, etc.
>>- Remove the ambiguities of the algorithm in section 8.1.
>>- Decide if we need to define a minimum size for the shared secret string.
> 
> What
> 
>>  is its relationship with entropy?
>>
>>In my opinion, what we are looking for is for an algorithm to 
>>canonicalize shared-secret strings (that they be limited or not) that 
>>produces an XML valid string. I would propose the following one:
>>
>>1. Remove all the control characters from the string
>>   --> reason: I feel that those characters could cause problems and they
> 
> could
> 
>>       not be typed al the time
>>2. Encode the string in UTF-8
>>   This will take into account accentuated characters
>>   --> reason: it's the only way to convert those characters into 
>>portable ASCII 3. Put the Hexa equvalent for each of those characters,
> 
> using lowercase
> 
>>   letters. Note that here we don't remove any punctuation symbols. We are
> 
> just
> 
>>   converting them.
>>  
>>This would convert Jos&  into [4a] [6f] [73] [26] [c3] [89]
>>
>>XKMS could be used by mobile devices too. If for some reason, we 
>>believe that it will be too much of overhead to make UTF-8 
>>convertions, we can just suppress all the characters above 127 ASCII. 
>>Another reason could be if the user has to type those characters in a 
>>phone and he doesn't have the full character set available. It could 
>>be that an operator at the other end cannot read a decoded
>>UTF-8 string if it's stored as such. This is some rationale as to why 
>>reduce the strings to ASCII 32-127.
>>
>>I don't know what would be the rationale for converting the strings to 
>>lower-case and suppressing all the punctuation symbols.
>>
>>Tommy had written:
>>
>>
>>
>>>Four implementors have independantly implemented the "Limited Use 
>>>Shared Secret" algorithm in a way that interoperates so I have not 
>>>seen a break down yet.  However, both the spec and the existing shared 
>>>secret distribution points (at least my service) avoid cases that lead 
>>>to ambigous interpretation.
>>
>>
>>This seems to imply that the places where the 8.1 are already 
>>identified. I think it would be good to make this explicit in the spec.
>>
>>
>>
>>>>Maybe change the spec to only allow a smaller subset of strings to 
>>>>become keys
>>
>>
>>>I'm in favor of this option, provided that the recommendations in 
>>>Section 10.4 can still be followed.
>>
>>
>>Ditto :)
>>
>>-jose
> 
> 
> 
>
Received on Friday, 17 December 2004 15:37:16 UTC