RE: Again, confusing 8.1

As the use of XML-sensitive characters is a problem, then can we not, and
should we not anyway, require that pass phrases be base64-encoded when used
within XML.  In fact, it would seem to be that this would be good practice
so the pass phrase does not get messed up by XML processing whether it
contains XML-ese or not.

BTW, I also think trailing and leading whitespace MUST be removed and
internal whitespace reduced to one space character (not zero).

Ed
========================================
Ed Simon
(613) 726-9645
edsimon@xmlsec.com 
Interested in XML, Web Services, or Security?  Visit "www.xmlsec.com".
Now available!  "Web Services Security" published by Osborne (ISBN#
0072224711)


-----Original Message-----
From: Stephen Farrell [mailto:stephen.farrell@cs.tcd.ie] 
Sent: December 17, 2004 10:42 AM
To: Ed Simon
Cc: www-xkms@w3.org
Subject: Re: Again, confusing 8.1


Hi Ed,

I guess we're all in agreement, but none of us seems to know exactly how to
write down what we want!

I agree that we have to support non Latin characters - if we didn't then
nearly everyone in certain parts of the world would use the same key (since
their entire string would be highly likely to be reduced to nothing!)
which'd be a bit of a security flaw as well as an I18N-nasty.

If we drop xml encoding (good, let's remove that degree of freedom), we're
then ok to directly use "<" characters in our strings?

As for the canonical UTF-8, that's what the stringprep RFC does, and
apparently it involves ~30Kb of object code (from memory, so may be wrong
there), so it has been found to be complicated.

Stephen.

Ed Simon wrote:

> I generally agree with Jose and Guillermo's recommendations EXCEPT for 
> the one about filtering UTF-8 characters outside the ASCII32-127.  
> Unless, there is a verifiable case to be made for disallowing non-Latin
characters (eg.
> Korean pass phrases) I would not include that possibility.  
> Ultimately, the pass phrase is just '1's and '0's and all we are doing 
> is saying how a human-readable/writable phrase can be consistently 
> converted into binary; that MAY not always mean the end device has to 
> understand Unicode, just binary.  (I say MAY because I'm not a mobile 
> device expert, I just want someone who is to say non-ASCII is a 
> problem before we try to accommodate
> it.)
> 
> I would drop mention of "XML Encoding" and call it "UTF-8" encoding; 
> not only do I think this is sensible from the outset but it also gets 
> rid of trying to process XMLese like entities etc.  I confess that I 
> have one question which is I am not absolutely sure (eg. due to 
> combining sequences) there is always one and only one binary 
> representation for every unique UTF-8-encoded pass phrase; Jose, can 
> you verify that with a W3C UTF-8 expert.  A follow-up question would 
> be whether we could use rules to canonicalise the UTF-8 (eg. do not 
> use combining characters) if there is more than one binary representation.
> 
> Regards, Ed
> ========================================
> Ed Simon
> (613) 726-9645
> edsimon@xmlsec.com
> Interested in XML, Web Services, or Security?  Visit "www.xmlsec.com".
> Now available!  "Web Services Security" published by Osborne (ISBN#
> 0072224711)
> 
> 
> -----Original Message-----
> From: www-xkms-request@w3.org [mailto:www-xkms-request@w3.org] On 
> Behalf Of Stephen Farrell
> Sent: December 17, 2004 7:48 AM
> To: jose.kahan@w3.org
> Cc: www-xkms@w3.org
> Subject: Re: Again, confusing 8.1
> 
> 
> 
> Jose,
> 
> I agree that this bit needs more work. A few points.
> 
> - Do we want to maintain interop with any existing implementations?
>    I believe we do. But I think its fair to assume that most
>    existing code hasn't taken the corner cases into account so we
>    should be ok to make non-interoperable changes for corner cases.
> - Reducing the keyspace isn't a real issue. English has something
>    like 1.5 bits of entropy per character, so unless you're using
>    really long strings it makes no difference - the space is
>    searchable anyway.
> - Case folding (H->h) is IMO worthwhile simply to avoid the
>    CAPSLOCK problem.
> - Some whitespace shrinkage is needed, e.g. "^t" vs "    ".
>    Most other specs shrink all consequtive whitspace characters
>    to one space, we currently eat 'em all which is a bit weird
>    but ok. (If Phill's listening maybe he had a reason for that?)
> - Punctuation character handling. Current spec is weird there.
>    They'd normally be included in the output.
> - We do have to determine how to handle XML encoding, e.g. of "&",
>    "%20", "<" etc. I've no clue how to properly do that.
> - We do have to determine how to define and handle control chars.
>    The latter is easy, the former I dunno how to do.
> - I don't think mobile devices etc is a real issue for us, since
>    input device limitations can be taken into account when the
>    strings are selected/generated.
> - We have to decide how to handle I18N. I think the current spec
>    is probably broken for countries which don't use Latin-1
>    characters at all. Again I'm not sure of the right thing here,
>    but there has been some (non-trivial) work done on this for
>    DNS [1], which is being taken up in various security related
>    specs, (and for which there's source code available) so maybe
>    using that is a good idea.
> 
> Stephen.
> 
> [1] 
> http://www.ietf.org/internet-drafts/draft-hoffman-rfc3454bis-02.txt
> 
> Jose Kahan wrote:
> 
>>Hi folks,
>>
>>Per last meeting's action item:
>>
>>  we will include a test case validating the string2key algorithm in
> 
> section
> 
>>  8.1,. AI: Guillermo and Jose to generate such test cases
>>
>>
>>After talking with Guillermo, we both found section 8.1 confusing. 
>>This section uses terms that are known in the security field. What is 
>>confusing is how they apply to XKMS. In particular:
>>
>>----------
>>- Is this algorithm meant to generated a one-time use pass phrase that 
>>can
> 
> be
> 
>>  read over the phone?
>>  
>>- All shared string values are encoded as XML
>>
>>  What is a shared string here? Is it a limited-use string? [242] 
>> proposes
> 
> a
> 
>>  a user-generated authentication phrase for revoking a public
>>  key: "Help I have revealed my key". However, when looking at section
> 
> C.2.1,
> 
>>  we find that the 8.1 algorithm was used to convert it "helpih...". 
>>
>>  If this phrase was a shared string, shouldn't it have been converted
> 
> into
> 
>>  XML, regardless of its content, and then the result converted into 
>> hexa,  without dropping spaces, punctuation, etc.?
>>
>>  What is the meaning of "encoded as XML"? Accentuated characters and  
>> "&'<> symbols encoded as entities (we would not be able to precise 
>> the  charset otherwise). Accentuated characters encoded as UTF-8?
>> 
>>  I couldn't find what the spec defines as "shared string" or why 8.1 
>> has
> 
> to be
> 
>>  applied always, regardless of who generated the shared secret.
>>
>>- All punctuation, space and control characters are removed.
>>
>>  I can understand why we remove control characters, but I can't
> 
> understand why
> 
>>  we remove punctuation, spaces. We can read them on the phone easily, 
>> I
> 
> think.
> 
>>  Moreover, by simplifying thus the pass phrase, aren't we making it 
>> more  vulnerable to oracle attacks?
>>  
>>- All upper case characters in the Latin-1 alphabet (A-Z) are 
>>converted to
> 
> lower case.
> 
>>  No other characters, including accented characters are converted
>>
>>  Why must uppercase be converted into lowercase? One can read them 
>> easily
> 
> on
> 
>>  the phone I think :)
>>  It's not clear what is done to the other characters or what was the  
>> rationale. From reading this, It seems that if my name is spelled Jos 
>> ,
> 
> it
> 
>>  would be converted to jos .
>>  
>>  This convertion also reduces the keyspace, and imo makes it more
> 
> vulnerable
> 
>>  to oracle attacks.
>>--------
>>
>>IMO, what we need to define is:
>>
>>- What is a limited-used shared secret
>>- When does the 8.1 algorithm need to be applied (make it an explicit
> 
> reference
> 
>>  in concerned sections)
>>- Decide if all such secrets should be speakable on the phone or be 
>>typed
> 
> with
> 
>>  a device that doesn't allow all those characters; use it as
>>  a rationale for removing punctuation, etc.
>>- Remove the ambiguities of the algorithm in section 8.1.
>>- Decide if we need to define a minimum size for the shared secret string.
> 
> What
> 
>>  is its relationship with entropy?
>>
>>In my opinion, what we are looking for is for an algorithm to 
>>canonicalize shared-secret strings (that they be limited or not) that 
>>produces an XML valid string. I would propose the following one:
>>
>>1. Remove all the control characters from the string
>>   --> reason: I feel that those characters could cause problems and 
>>they
> 
> could
> 
>>       not be typed al the time
>>2. Encode the string in UTF-8
>>   This will take into account accentuated characters
>>   --> reason: it's the only way to convert those characters into 
>>portable ASCII 3. Put the Hexa equvalent for each of those characters,
> 
> using lowercase
> 
>>   letters. Note that here we don't remove any punctuation symbols. We 
>> are
> 
> just
> 
>>   converting them.
>>  
>>This would convert Jos&  into [4a] [6f] [73] [26] [c3] [89]
>>
>>XKMS could be used by mobile devices too. If for some reason, we 
>>believe that it will be too much of overhead to make UTF-8 
>>convertions, we can just suppress all the characters above 127 ASCII.
>>Another reason could be if the user has to type those characters in a 
>>phone and he doesn't have the full character set available. It could 
>>be that an operator at the other end cannot read a decoded
>>UTF-8 string if it's stored as such. This is some rationale as to why 
>>reduce the strings to ASCII 32-127.
>>
>>I don't know what would be the rationale for converting the strings to 
>>lower-case and suppressing all the punctuation symbols.
>>
>>Tommy had written:
>>
>>
>>
>>>Four implementors have independantly implemented the "Limited Use 
>>>Shared Secret" algorithm in a way that interoperates so I have not 
>>>seen a break down yet.  However, both the spec and the existing 
>>>shared secret distribution points (at least my service) avoid cases 
>>>that lead to ambigous interpretation.
>>
>>
>>This seems to imply that the places where the 8.1 are already 
>>identified. I think it would be good to make this explicit in the spec.
>>
>>
>>
>>>>Maybe change the spec to only allow a smaller subset of strings to 
>>>>become keys
>>
>>
>>>I'm in favor of this option, provided that the recommendations in 
>>>Section 10.4 can still be followed.
>>
>>
>>Ditto :)
>>
>>-jose
> 
> 
> 
> 

Received on Friday, 17 December 2004 15:47:23 UTC