RE: Again, confusing 8.1

I generally agree with Jose and Guillermo's recommendations EXCEPT for the
one about filtering UTF-8 characters outside the ASCII32-127.  Unless, there
is a verifiable case to be made for disallowing non-Latin characters (eg.
Korean pass phrases) I would not include that possibility.  Ultimately, the
pass phrase is just '1's and '0's and all we are doing is saying how a
human-readable/writable phrase can be consistently converted into binary;
that MAY not always mean the end device has to understand Unicode, just
binary.  (I say MAY because I'm not a mobile device expert, I just want
someone who is to say non-ASCII is a problem before we try to accommodate
it.)

I would drop mention of "XML Encoding" and call it "UTF-8" encoding; not
only do I think this is sensible from the outset but it also gets rid of
trying to process XMLese like entities etc.  I confess that I have one
question which is I am not absolutely sure (eg. due to combining sequences)
there is always one and only one binary representation for every unique
UTF-8-encoded pass phrase; Jose, can you verify that with a W3C UTF-8
expert.  A follow-up question would be whether we could use rules to
canonicalise the UTF-8 (eg. do not use combining characters) if there is
more than one binary representation.

Regards, Ed
========================================
Ed Simon
(613) 726-9645
edsimon@xmlsec.com 
Interested in XML, Web Services, or Security?  Visit "www.xmlsec.com".
Now available!  "Web Services Security" published by Osborne (ISBN#
0072224711)


-----Original Message-----
From: www-xkms-request@w3.org [mailto:www-xkms-request@w3.org] On Behalf Of
Stephen Farrell
Sent: December 17, 2004 7:48 AM
To: jose.kahan@w3.org
Cc: www-xkms@w3.org
Subject: Re: Again, confusing 8.1



Jose,

I agree that this bit needs more work. A few points.

- Do we want to maintain interop with any existing implementations?
   I believe we do. But I think its fair to assume that most
   existing code hasn't taken the corner cases into account so we
   should be ok to make non-interoperable changes for corner cases.
- Reducing the keyspace isn't a real issue. English has something
   like 1.5 bits of entropy per character, so unless you're using
   really long strings it makes no difference - the space is
   searchable anyway.
- Case folding (H->h) is IMO worthwhile simply to avoid the
   CAPSLOCK problem.
- Some whitespace shrinkage is needed, e.g. "^t" vs "    ".
   Most other specs shrink all consequtive whitspace characters
   to one space, we currently eat 'em all which is a bit weird
   but ok. (If Phill's listening maybe he had a reason for that?)
- Punctuation character handling. Current spec is weird there.
   They'd normally be included in the output.
- We do have to determine how to handle XML encoding, e.g. of "&",
   "%20", "<" etc. I've no clue how to properly do that.
- We do have to determine how to define and handle control chars.
   The latter is easy, the former I dunno how to do.
- I don't think mobile devices etc is a real issue for us, since
   input device limitations can be taken into account when the
   strings are selected/generated.
- We have to decide how to handle I18N. I think the current spec
   is probably broken for countries which don't use Latin-1
   characters at all. Again I'm not sure of the right thing here,
   but there has been some (non-trivial) work done on this for
   DNS [1], which is being taken up in various security related
   specs, (and for which there's source code available) so maybe
   using that is a good idea.

Stephen.

[1] http://www.ietf.org/internet-drafts/draft-hoffman-rfc3454bis-02.txt

Jose Kahan wrote:
> Hi folks,
> 
> Per last meeting's action item:
> 
>   we will include a test case validating the string2key algorithm in
section
>   8.1,. AI: Guillermo and Jose to generate such test cases
> 
> 
> After talking with Guillermo, we both found section 8.1 confusing. 
> This section uses terms that are known in the security field. What is 
> confusing is how they apply to XKMS. In particular:
> 
> ----------
> - Is this algorithm meant to generated a one-time use pass phrase that can
be
>   read over the phone?
>   
> - All shared string values are encoded as XML
> 
>   What is a shared string here? Is it a limited-use string? [242] proposes
a
>   a user-generated authentication phrase for revoking a public 
>   key: "Help I have revealed my key". However, when looking at section
C.2.1,
>   we find that the 8.1 algorithm was used to convert it "helpih...". 
> 
>   If this phrase was a shared string, shouldn't it have been converted
into
>   XML, regardless of its content, and then the result converted into hexa,
>   without dropping spaces, punctuation, etc.?
> 
>   What is the meaning of "encoded as XML"? Accentuated characters and
>   "&'<> symbols encoded as entities (we would not be able to precise the
>   charset otherwise). Accentuated characters encoded as UTF-8?
>  
>   I couldn't find what the spec defines as "shared string" or why 8.1 has
to be
>   applied always, regardless of who generated the shared secret.
> 
> - All punctuation, space and control characters are removed.
> 
>   I can understand why we remove control characters, but I can't
understand why
>   we remove punctuation, spaces. We can read them on the phone easily, I
think.
> 
>   Moreover, by simplifying thus the pass phrase, aren't we making it more
>   vulnerable to oracle attacks?
>   
> - All upper case characters in the Latin-1 alphabet (A-Z) are converted to
lower case.
>   No other characters, including accented characters are converted
> 
>   Why must uppercase be converted into lowercase? One can read them easily
on
>   the phone I think :)
>   It's not clear what is done to the other characters or what was the
>   rationale. From reading this, It seems that if my name is spelled Jos ,
it
>   would be converted to jos .
>   
>   This convertion also reduces the keyspace, and imo makes it more
vulnerable
>   to oracle attacks.
> --------
> 
> IMO, what we need to define is:
> 
> - What is a limited-used shared secret
> - When does the 8.1 algorithm need to be applied (make it an explicit
reference
>   in concerned sections)
> - Decide if all such secrets should be speakable on the phone or be typed
with 
>   a device that doesn't allow all those characters; use it as
>   a rationale for removing punctuation, etc.
> - Remove the ambiguities of the algorithm in section 8.1.
> - Decide if we need to define a minimum size for the shared secret string.
What
>   is its relationship with entropy?
> 
> In my opinion, what we are looking for is for an algorithm to 
> canonicalize shared-secret strings (that they be limited or not) that 
> produces an XML valid string. I would propose the following one:
> 
> 1. Remove all the control characters from the string
>    --> reason: I feel that those characters could cause problems and they
could
>        not be typed al the time
> 2. Encode the string in UTF-8
>    This will take into account accentuated characters
>    --> reason: it's the only way to convert those characters into 
> portable ASCII 3. Put the Hexa equvalent for each of those characters,
using lowercase
>    letters. Note that here we don't remove any punctuation symbols. We are
just
>    converting them.
>   
> This would convert Jos&  into [4a] [6f] [73] [26] [c3] [89]
> 
> XKMS could be used by mobile devices too. If for some reason, we 
> believe that it will be too much of overhead to make UTF-8 
> convertions, we can just suppress all the characters above 127 ASCII. 
> Another reason could be if the user has to type those characters in a 
> phone and he doesn't have the full character set available. It could 
> be that an operator at the other end cannot read a decoded
> UTF-8 string if it's stored as such. This is some rationale as to why 
> reduce the strings to ASCII 32-127.
> 
> I don't know what would be the rationale for converting the strings to 
> lower-case and suppressing all the punctuation symbols.
> 
> Tommy had written:
> 
> 
>>Four implementors have independantly implemented the "Limited Use 
>>Shared Secret" algorithm in a way that interoperates so I have not 
>>seen a break down yet.  However, both the spec and the existing shared 
>>secret distribution points (at least my service) avoid cases that lead 
>>to ambigous interpretation.
> 
> 
> This seems to imply that the places where the 8.1 are already 
> identified. I think it would be good to make this explicit in the spec.
> 
> 
>>>Maybe change the spec to only allow a smaller subset of strings to 
>>>become keys
> 
> 
>>I'm in favor of this option, provided that the recommendations in 
>>Section 10.4 can still be followed.
> 
> 
> Ditto :)
> 
> -jose

Received on Friday, 17 December 2004 15:27:40 UTC