Re: Proposed stringprep algorithn from Section 8.1 from Stephen Farrell on 2004-12-22 (www-xkms@w3.org from December 2004)

From: Stephen Farrell <stephen.farrell@cs.tcd.ie>
Date: Wed, 22 Dec 2004 16:56:45 +0000
To: www-xkms@w3.org
Message-ID: <41C9A74D.9050209@cs.tcd.ie>
Jose,

That's nearly fine *except*:

- not folding whitespace means you can't hand out paper versions
   unless you use fixed-width fonts, not deleting leading and
   trailing whitespace is also broken for the same reason
- the current spec is IMO correct in using case-folding since
   it avoids the CAPSLOCK problem
- my preference is to allow implementers to choose whether
   to ignore non-conforming characters or to generate
   an error, I can see situations where both are appropriate.

So I wouldn't be in favor of your proposal as it stands. If the
three things there were fixed then I'd be fine with it.

Now putting on the chair-hat:

Given that its getting so close to the holidays I think we can't
expect to bottom this out until the new year, so I'd like to canvas
implementer opinion as to whether to take the new algorithm as
proposed by Jose below, or that algorithm modified as above.

Can you try to express you opinions by say Jan 7th and we'll decide
then. (I don't want this to be a topic for our Jan concall since
we're supposed to be signing off on the test spec which requires
this to be resolved first.)

There are three options:

1. The current spec's algorithm
2. Jose's proposal below
3. Jose's proposal but as modified above

Please pick one and feel free to add comments, but do pick one,
we need to get this closed soon if we're not to slip the planned
timeframe. If you pick #1 please state how you'd modify the text
since its currently clearly confusing at best. If you don't like
any of the above, please provide your own algorithm (with detail)
or don't get involved in the discussion!

Meanwhile, enjoy the holidays and have a good new year,
Stephen.

Jose Kahan wrote:

> Hello folks,
> 
> After some discussions and feedback, here's a revised proposal for the
> string2key algorithm in Section 8.1. This proposal takes into account
> Ed Simon's authentication device remarks and the I18N string concerns.
> I checked with Martin Duerst from the W3C I18N WG and he says the I18N 
> part is fine with him.
> 
> See my notes also as to why I argue against adding case, space, and
> punctuation folding into this algorithm.
> 
> Cheers!
> 
> -jose
> 
> ----------
> 
> [329a] The symmetric key data MAY be binary data (as from an
> authentication device) or as a human-readable value (numeric,
> alphabetic, or both).  When it is binary data, no transformation is
> needed; the data can be used directly as input to the MAC function.
> 
> [329b]When the symmetric key data is human-readable, it may be issued to
> a human user in the form of a text string which may in some circumstances
> be read over a telephone line. It may be randomly generated and represent
> an underlying numeric value, or may be a password or phrase. In either
> case, it is often convenient to present the value to the human user as a 
> string of characters in a character set the particular user 
> understands.  To limit the possibility of human error in processing the
> symmetric key data, and to provide a canonical binary representation, 
> the string text must be compliant to the SASLprep stringprep profile
> for user names and passwords[1].
>   
> [329c]The algorithm for canonicalizing a string-text before feeding it
> to the MAC function is the following:
> 
>   1. Convert the input string to an Unicode encoding 
>      
>      Removes the  US-ASCII and ISO-LATIN-1 limitations! Let's
>      a user type a password phrase that s/he can remember with ease
>      or that's easy to type with his/her keyboard configuration.
> 
>   2. Verify that the input string is compliant to the SASLprep 
>      stringprep profile for user names and passwords [1]. Refuse
>      the string otherwise.
> 
>    This operation consists of mapping and normalizing the characters in 
>    the string, and checking that it doesn't have any forbidden characters.
>    In particular, there's no folding of multiple spaces or of
>    case. Punctuation symbols are not removed either. Tabs are 
>    control characters and thus are considered to be forbidden.
> 
>   3. Encode the result into UTF-8
>   4. Apply the MAC functions
> 
> ----
> 
> The implication of adopting this algorithm means that we have to
> regenrate some of the XKRSS examples and the related converted strings
> given in Appendix C.
> 
> For developers, if you stick to US-ASCII ranges 32-126, then you don't
> have to change much, except to remove the upper-lower case convertion
> and space suppresion. US-ASCII maps one to one in UTF-8. Your
> application should make sure you don't use characters outside the
> 32-126 range.
> 
> If you do want to support Unicode, there are system libraries in C,
> Java, Perl, Win32 that do so already and can help you. Once you code
> your strings into Unicode, applying the SASLprep stringprep profile
> means going thru some tables and checking if a character belongs to
> them or not. There is also a library that already provides this
> checking. See my notes.
> 
> -----------
> 
> Notes:
> 
> 1. My case against case folding
> 
> Summary: I don't see a reason why we should do case or space folding.
> 
> In 2), SASLprep doesn't propose case folding. I still don't agree with
> imposing case folding (and reducing the password space) unless there
> is a good reason. It seems it's a tradition for Internet applications
> to work with small case strings. This makes sense to me if we are
> talking about DNS domain names. Some cases where this tradition has
> never been applied:
> 
>  - HTTP Authentication protocol [2]
>  - Web server passwords in Apache are not caseless.
> 
> A case where case folding is not enforced, but has caused problems:
> 
>  - URLs of pages. For example, we have to run a special Apache module
>    in our servers to take care of this. However, this is not done
>    transparently; if the user types a password using wrong case, the
>    server returns a redirection to the correctly spelled page.
> 
> I've asked Thomas Roesller, who recently joined W3C as a security
> consultant, to give additional feedback on why doing case folding here
> is not appropriate. He told me he'll do so once he finished reading
> the spec.
> 
> If the WG still wants to push folding, it should give a valid reason
> why it has to be done. Note that some languages are caseless too.
> 
> 2. About how to apply the SASLprep stringprep profile
> 
> Summary: I think it is better to have the application refuse
> strings that have forbidden characters rather than just 
> silently removing them.
> 
> There are some parts of the profile that can be done transparently
> from the user like mapping and normalizing characters. However, then
> we reach a step called "prohibited output" (section 2.3). There are
> two ways to approach this:
> 
> 1. Have the application complain that the string has prohibited
>    characters (mentioning the relevant ones) and ask the user to try
> again.
> 
> 2. Have the application silently discard the prohibited characters.
> 
> In my personal experience, having the application silently discard
> characters is wrong for this context. The user may think that he typed
> something and that wasn't the case.  If the application prompts the
> user saying that the string has invalid characters, the user knows
> that s/he did something wrong, s/he may try again. 
> 
> The only case where I know this works this way is the Unix standard
> passwords which only support 8 characters and drop the rest.  But it
> does support control characters. It is recommended not to use control
> characters because it may be hard typing them over a telnet
> connection, but this is not enforced. The user is free to type what
> s/he wants.
> 
> What I would propose is that we stick to the SASLprep stringprep
> profile and just say that the application MUST not accept strings
> that have illegal characters and recommend that the application returns
> an error message in such cases. How this is implemented, is out of scope
> of the XKMS spec.
> 
> From the architecture point of view, we assume we have a string that
> corresponds to the SASLprep profile. From the implementation point of
> view, by refusing string messages that don't correspond to this
> profile, we remove the risk of having a user be able to type a
> password in some devices and not in others, without knowing why it
> works sometimes.
> 
> 3. Some programming tools
> 
>    libiconv [3]. This library provides an iconv() implementation, for
>    use on systems which don't have one, or whose implementation cannot
>    convert from/to Unicode.
> 
>    libidn [4].
> 
>    GNU Libidn is an implementation of the Stringprep, Punycode and
>    IDNA specifications defined by the IETF Internationalized Domain
>    Names (IDN) working group, used for internationalized domain names.
>    idn - Internationalized Domain Names command line tool
> 
> 
> Both of these tools also provide command line programs (at least in my
> debian sarge box) called iconv and idn. They can be used to get
> familiarized with stringprep and unicode.
> 
> 4. Some examples
> 
> - Converting a file from ISO-8859-1 to UTF-8:
>  
>   iconv -c -f ISO-8859-1 -t UTF-8  filename
>   
>   (substitute the first parameter for the origin charset).
> 
> Canonicalizing a string according to the SASLprep stringprep profile
> (assuming a file has the string in UTF-8)
> 
>   cat filename | CHARSET=UTF-8 idn --quiet -s -p SASLprep
> 
> idn returns the canonicalized string (in UTF-8) or returns error if
> there are forbidden characters inside it. 
> 
> Converting:
> 
>  sdal-fjoi-utlk-dsjf-oiae-jasl-dkjk-cjvl-sdui-oasd-ioek-ilij-iore
> 
> Returns:
> 
>   sdal-fjoi-utlk-dsjf-oiae-jasl-dkjk-cjvl-sdui-oasd-ioek-ilij-iore
> 
> (same string, same characters under UTF-8)
> 
> Likewise, converting:
> 
>   sdalfjoiutlkdsjfoiaejasldkjkcjvlsduioasdioekilijiore
> 
> Returns the same string:
> 
>  sdalfjoiutlkdsjfoiaejasldkjkcjvlsduioasdioekilijiore
> 
> Note that there was no folding or discarding of characters in both
> of these examples.
> 
> [1] http://www.ietf.org/internet-drafts/draft-ietf-sasl-saslprep-10.txt
> [2] http://www.ietf.org/rfc/rfc2617.txt
> [3] http://www.gnu.org/software/libiconv/
> [4] http://www.gnu.org/software/libidn
Received on Wednesday, 22 December 2004 16:51:59 UTC