Re: Again, confusing 8.1 from Stephen Farrell on 2004-12-17 (www-xkms@w3.org from December 2004)

From: Stephen Farrell <stephen.farrell@cs.tcd.ie>
Date: Fri, 17 Dec 2004 17:00:57 +0000
To: Ed Simon <edsimon@xmlsec.com>
Cc: www-xkms@w3.org
Message-ID: <41C310C9.3020800@cs.tcd.ie>
Ed,

I generally agree. What I meant that base64 would break
interop, as would (the otherwise acceptable) reduction to
a single internal space.

So how about a scheme with the properties:

- Encode as UTF-8 according to stringprep (and someone has to
   figure out details there maybe)
- Case-fold
- Reduce whitespace (to zero for interop I guess)

As you more-or-less said, I'd expect that that'd be ok with
most of the examples already processed and with most code.

So, two things to hopefully close this down:

- Any objections?
- Anyone volunteer to go through stringprep [1], make sure
   there're no gotchas, and write text?

(BTW: Formally, I guess we'd have to refer to the stringprep
RFC [2] and not the I-D, but we ought to check against the I-D
just in case).

Cheers,
Stephen.

[1] http://www.ietf.org/internet-drafts/draft-hoffman-rfc3454bis-02.txt
[2] http://www.ietf.org/rfc/rfc3454.txt

Ed Simon wrote:

> It seems to me that requiring an XML processor (right?) is going to be
> particularly performance-consuming.  Plus one has to deal with exactly what
> "All shared string values are encoded as XML" means.  To me, it means that
> the pass phrase MUST be valid XML (eg. 
> 
> "<Pass_Phrase xmlns="http://example.com/secrets">my
> <Adjective>little</Adjective>
> <![CDATA[&lt;]]>secret<![CDATA[&gt;]]>!</Pass_Phrase>"
> 
> ) or else it is NOT a valid pass phrase, AND, therefore, pass phrase tools
> must be full-fledged XML parsers capable of dealing with potential attacks
> like entity expansion.  There is also a contradiction that if one requires
> conversion to lower-case, one invalidates XML such as that in my example
> because XML names are case-sensitive.  It seems to me the constraints are
> contradictory.
> 
> I think what was originally intended was something like "encode as UTF-8"; I
> expect requiring this would NOT break the interop cases done thus far
> because I would guess no one is trying to use pass phrases that are, in
> themselves, valid XML.
> 
> Ed
> ========================================
> Ed Simon
> (613) 726-9645
> edsimon@xmlsec.com 
> Interested in XML, Web Services, or Security?  Visit "www.xmlsec.com".
> Now available!  "Web Services Security" published by Osborne (ISBN#
> 0072224711)
> 
> 
> -----Original Message-----
> From: www-xkms-request@w3.org [mailto:www-xkms-request@w3.org] On Behalf Of
> Stephen Farrell
> Sent: December 17, 2004 11:01 AM
> To: Ed Simon
> Cc: www-xkms@w3.org
> Subject: Re: Again, confusing 8.1
> 
> 
> 
> Hi Ed,
> 
> I'd agree with both of those, except that both would break interop and its
> getting late in the day to do that...
> 
> Stephen.
> 
> Ed Simon wrote:
> 
> 
>>As the use of XML-sensitive characters is a problem, then can we not, 
>>and should we not anyway, require that pass phrases be base64-encoded 
>>when used within XML.  In fact, it would seem to be that this would be 
>>good practice so the pass phrase does not get messed up by XML 
>>processing whether it contains XML-ese or not.
>>
>>BTW, I also think trailing and leading whitespace MUST be removed and 
>>internal whitespace reduced to one space character (not zero).
>>
>>Ed
>>========================================
>>Ed Simon
>>(613) 726-9645
>>edsimon@xmlsec.com
>>Interested in XML, Web Services, or Security?  Visit "www.xmlsec.com".
>>Now available!  "Web Services Security" published by Osborne (ISBN#
>>0072224711)
>>
>>
>>-----Original Message-----
>>From: Stephen Farrell [mailto:stephen.farrell@cs.tcd.ie]
>>Sent: December 17, 2004 10:42 AM
>>To: Ed Simon
>>Cc: www-xkms@w3.org
>>Subject: Re: Again, confusing 8.1
>>
>>
>>Hi Ed,
>>
>>I guess we're all in agreement, but none of us seems to know exactly 
>>how to write down what we want!
>>
>>I agree that we have to support non Latin characters - if we didn't 
>>then nearly everyone in certain parts of the world would use the same 
>>key (since their entire string would be highly likely to be reduced to 
>>nothing!) which'd be a bit of a security flaw as well as an I18N-nasty.
>>
>>If we drop xml encoding (good, let's remove that degree of freedom), 
>>we're then ok to directly use "<" characters in our strings?
>>
>>As for the canonical UTF-8, that's what the stringprep RFC does, and 
>>apparently it involves ~30Kb of object code (from memory, so may be 
>>wrong there), so it has been found to be complicated.
>>
>>Stephen.
>>
>>Ed Simon wrote:
>>
>>
>>
>>>I generally agree with Jose and Guillermo's recommendations EXCEPT for 
>>>the one about filtering UTF-8 characters outside the ASCII32-127.
>>>Unless, there is a verifiable case to be made for disallowing 
>>>non-Latin
>>
>>characters (eg.
>>
>>
>>>Korean pass phrases) I would not include that possibility.  
>>>Ultimately, the pass phrase is just '1's and '0's and all we are doing 
>>>is saying how a human-readable/writable phrase can be consistently 
>>>converted into binary; that MAY not always mean the end device has to 
>>>understand Unicode, just binary.  (I say MAY because I'm not a mobile 
>>>device expert, I just want someone who is to say non-ASCII is a 
>>>problem before we try to accommodate
>>>it.)
>>>
>>>I would drop mention of "XML Encoding" and call it "UTF-8" encoding; 
>>>not only do I think this is sensible from the outset but it also gets 
>>>rid of trying to process XMLese like entities etc.  I confess that I 
>>>have one question which is I am not absolutely sure (eg. due to 
>>>combining sequences) there is always one and only one binary 
>>>representation for every unique UTF-8-encoded pass phrase; Jose, can 
>>>you verify that with a W3C UTF-8 expert.  A follow-up question would 
>>>be whether we could use rules to canonicalise the UTF-8 (eg. do not 
>>>use combining characters) if there is more than one binary representation.
>>>
>>>Regards, Ed
>>>========================================
>>>Ed Simon
>>>(613) 726-9645
>>>edsimon@xmlsec.com
>>>Interested in XML, Web Services, or Security?  Visit "www.xmlsec.com".
>>>Now available!  "Web Services Security" published by Osborne (ISBN#
>>>0072224711)
>>>
>>>
>>>-----Original Message-----
>>>From: www-xkms-request@w3.org [mailto:www-xkms-request@w3.org] On 
>>>Behalf Of Stephen Farrell
>>>Sent: December 17, 2004 7:48 AM
>>>To: jose.kahan@w3.org
>>>Cc: www-xkms@w3.org
>>>Subject: Re: Again, confusing 8.1
>>>
>>>
>>>
>>>Jose,
>>>
>>>I agree that this bit needs more work. A few points.
>>>
>>>- Do we want to maintain interop with any existing implementations?
>>>  I believe we do. But I think its fair to assume that most
>>>  existing code hasn't taken the corner cases into account so we
>>>  should be ok to make non-interoperable changes for corner cases.
>>>- Reducing the keyspace isn't a real issue. English has something
>>>  like 1.5 bits of entropy per character, so unless you're using
>>>  really long strings it makes no difference - the space is
>>>  searchable anyway.
>>>- Case folding (H->h) is IMO worthwhile simply to avoid the
>>>  CAPSLOCK problem.
>>>- Some whitespace shrinkage is needed, e.g. "^t" vs "    ".
>>>  Most other specs shrink all consequtive whitspace characters
>>>  to one space, we currently eat 'em all which is a bit weird
>>>  but ok. (If Phill's listening maybe he had a reason for that?)
>>>- Punctuation character handling. Current spec is weird there.
>>>  They'd normally be included in the output.
>>>- We do have to determine how to handle XML encoding, e.g. of "&",
>>>  "%20", "<" etc. I've no clue how to properly do that.
>>>- We do have to determine how to define and handle control chars.
>>>  The latter is easy, the former I dunno how to do.
>>>- I don't think mobile devices etc is a real issue for us, since
>>>  input device limitations can be taken into account when the
>>>  strings are selected/generated.
>>>- We have to decide how to handle I18N. I think the current spec
>>>  is probably broken for countries which don't use Latin-1
>>>  characters at all. Again I'm not sure of the right thing here,
>>>  but there has been some (non-trivial) work done on this for
>>>  DNS [1], which is being taken up in various security related
>>>  specs, (and for which there's source code available) so maybe
>>>  using that is a good idea.
>>>
>>>Stephen.
>>>
>>>[1]
>>>http://www.ietf.org/internet-drafts/draft-hoffman-rfc3454bis-02.txt
>>>
>>>Jose Kahan wrote:
>>>
>>>
>>>
>>>>Hi folks,
>>>>
>>>>Per last meeting's action item:
>>>>
>>>>we will include a test case validating the string2key algorithm in
>>>
>>>section
>>>
>>>
>>>
>>>>8.1,. AI: Guillermo and Jose to generate such test cases
>>>>
>>>>
>>>>After talking with Guillermo, we both found section 8.1 confusing. 
>>>>This section uses terms that are known in the security field. What is 
>>>>confusing is how they apply to XKMS. In particular:
>>>>
>>>>----------
>>>>- Is this algorithm meant to generated a one-time use pass phrase 
>>>>that can
>>>
>>>be
>>>
>>>
>>>
>>>>read over the phone?
>>>>
>>>>- All shared string values are encoded as XML
>>>>
>>>>What is a shared string here? Is it a limited-use string? [242] 
>>>>proposes
>>>
>>>a
>>>
>>>
>>>
>>>>a user-generated authentication phrase for revoking a public
>>>>key: "Help I have revealed my key". However, when looking at section
>>>
>>>C.2.1,
>>>
>>>
>>>
>>>>we find that the 8.1 algorithm was used to convert it "helpih...". 
>>>>
>>>>If this phrase was a shared string, shouldn't it have been converted
>>>
>>>into
>>>
>>>
>>>
>>>>XML, regardless of its content, and then the result converted into 
>>>>hexa,  without dropping spaces, punctuation, etc.?
>>>>
>>>>What is the meaning of "encoded as XML"? Accentuated characters and 
>>>>"&'<> symbols encoded as entities (we would not be able to precise 
>>>>the  charset otherwise). Accentuated characters encoded as UTF-8?
>>>>
>>>>I couldn't find what the spec defines as "shared string" or why 8.1 
>>>>has
>>>
>>>to be
>>>
>>>
>>>
>>>>applied always, regardless of who generated the shared secret.
>>>>
>>>>- All punctuation, space and control characters are removed.
>>>>
>>>>I can understand why we remove control characters, but I can't
>>>
>>>understand why
>>>
>>>
>>>
>>>>we remove punctuation, spaces. We can read them on the phone easily, 
>>>>I
>>>
>>>think.
>>>
>>>
>>>
>>>>Moreover, by simplifying thus the pass phrase, aren't we making it 
>>>>more  vulnerable to oracle attacks?
>>>>
>>>>- All upper case characters in the Latin-1 alphabet (A-Z) are 
>>>>converted to
>>>
>>>lower case.
>>>
>>>
>>>
>>>>No other characters, including accented characters are converted
>>>>
>>>>Why must uppercase be converted into lowercase? One can read them 
>>>>easily
>>>
>>>on
>>>
>>>
>>>
>>>>the phone I think :)
>>>>It's not clear what is done to the other characters or what was the 
>>>>rationale. From reading this, It seems that if my name is spelled Jos 
>>>>,
>>>
>>>it
>>>
>>>
>>>
>>>>would be converted to jos .
>>>>
>>>>This convertion also reduces the keyspace, and imo makes it more
>>>
>>>vulnerable
>>>
>>>
>>>
>>>>to oracle attacks.
>>>>--------
>>>>
>>>>IMO, what we need to define is:
>>>>
>>>>- What is a limited-used shared secret
>>>>- When does the 8.1 algorithm need to be applied (make it an explicit
>>>
>>>reference
>>>
>>>
>>>
>>>>in concerned sections)
>>>>- Decide if all such secrets should be speakable on the phone or be 
>>>>typed
>>>
>>>with
>>>
>>>
>>>
>>>>a device that doesn't allow all those characters; use it as  a 
>>>>rationale for removing punctuation, etc.
>>>>- Remove the ambiguities of the algorithm in section 8.1.
>>>>- Decide if we need to define a minimum size for the shared secret
> 
> string.
> 
>>>What
>>>
>>>
>>>
>>>>is its relationship with entropy?
>>>>
>>>>In my opinion, what we are looking for is for an algorithm to 
>>>>canonicalize shared-secret strings (that they be limited or not) that 
>>>>produces an XML valid string. I would propose the following one:
>>>>
>>>>1. Remove all the control characters from the string
>>>> --> reason: I feel that those characters could cause problems and 
>>>>they
>>>
>>>could
>>>
>>>
>>>
>>>>     not be typed al the time
>>>>2. Encode the string in UTF-8
>>>> This will take into account accentuated characters
>>>> --> reason: it's the only way to convert those characters into 
>>>>portable ASCII 3. Put the Hexa equvalent for each of those 
>>>>characters,
>>>
>>>using lowercase
>>>
>>>
>>>
>>>> letters. Note that here we don't remove any punctuation symbols. We 
>>>>are
>>>
>>>just
>>>
>>>
>>>
>>>> converting them.
>>>>
>>>>This would convert Jos&  into [4a] [6f] [73] [26] [c3] [89]
>>>>
>>>>XKMS could be used by mobile devices too. If for some reason, we 
>>>>believe that it will be too much of overhead to make UTF-8 
>>>>convertions, we can just suppress all the characters above 127 ASCII.
>>>>Another reason could be if the user has to type those characters in a 
>>>>phone and he doesn't have the full character set available. It could 
>>>>be that an operator at the other end cannot read a decoded
>>>>UTF-8 string if it's stored as such. This is some rationale as to why 
>>>>reduce the strings to ASCII 32-127.
>>>>
>>>>I don't know what would be the rationale for converting the strings 
>>>>to lower-case and suppressing all the punctuation symbols.
>>>>
>>>>Tommy had written:
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>>Four implementors have independantly implemented the "Limited Use 
>>>>>Shared Secret" algorithm in a way that interoperates so I have not 
>>>>>seen a break down yet.  However, both the spec and the existing 
>>>>>shared secret distribution points (at least my service) avoid cases 
>>>>>that lead to ambigous interpretation.
>>>>
>>>>
>>>>This seems to imply that the places where the 8.1 are already 
>>>>identified. I think it would be good to make this explicit in the spec.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>>>Maybe change the spec to only allow a smaller subset of strings to 
>>>>>>become keys
>>>>
>>>>
>>>>>I'm in favor of this option, provided that the recommendations in 
>>>>>Section 10.4 can still be followed.
>>>>
>>>>
>>>>Ditto :)
>>>>
>>>>-jose
>>>
>>>
>>>
>>>
>>
>>
> 
> 
>
Received on Friday, 17 December 2004 16:56:18 UTC