Re: Some issues with the IRI document [NFCsecurity-09] from Simon Josefsson on 2003-05-08 (public-iri@w3.org from May 2003)

From: Simon Josefsson <jas@extundo.com>
Date: Fri, 09 May 2003 01:13:01 +0200
To: Martin Duerst <duerst@w3.org>
Cc: public-iri@w3.org
Message-ID: <ilu4r452fsy.fsf@latte.josefsson.org>
Martin Duerst <duerst@w3.org> writes:

> Hello Simon,
>
> Sorry for the delay in moving this issue forward.

Hello.

>> > NFC is only required in the current draft when encoding something
>> > from e.g. the side of a bus, or when transcoding it from a non-
>> > Unicode encoding. This is only to provide a base level of
>> > predictability.
>>
>>Section 2.4 says "IRIs SHOULD be created using Normalization Form C
>>(NFC)."  I don't interprete that to only apply to the scenarios you
>>mention, rather it seem to imply that whenever a IRI is created, by
>>whatever process, NFC should be used.  Perhaps it can be clarified?
>
> I see what you mean. It definitely should be qualified. The intent
> is NOT to use NFC in all cases. That's why the SHOULD is there.
> If there is some underlying data that is not in NFC, then
> normalizing to NFC when creating the IRI is a bad idea.
> I haven't yet worded this out, but I think that's what the
> document should say. Would that remove your concerns?

I think so, yes.

>> > Still indeed this could lead to security problems with the
>> > mechanisms you describe above, if there are two users that have user
>> > names that only differ in normalization. But it would seem to me
>> > that in this case, the security issue comes from these mechanisms
>> > (or the actual use with these specific user names) rather than from
>> > IRIs.
>>
>>If this is so, I believe the IRI security considerations should
>>mention this so that people can be aware of it, and abstain from
>>deploying IRIs in systems that behave in this way,
>
> I think it's not a problem of systems, but a problem of
> the individual IRIs. I.e. if there are two user names that
> only differ by normalization, then there is no reason not
> to use IRIs for all the other users.

If I were to design the system, and knew IRIs couldn't distinguish
between two international words that may be used in customers deployed
systems, I'd be cautious about using IRIs at all.  I'm mostly only
familiar with NFKC and it would generate these hassles, and perhaps
NFC avoids them.

>>    Whenever fields of an IRI are normalized, the octet representation
>>    is modified.  While this is unavoidable if ambiguities are to be
>>    resolved, it can raise security issues in some situations.  In
>>    particular, if a iuserinfo field is normalized, a security protocol
>>    expecting a certain byte sequence as a username may receive a
>>    different one.  This can lead to interoperability failures, but
>>    also more serious failures in systems, e.g. when the system
>>    performs authorization based on the username.
>
> I'm currently not convinced we need this text (assuming the
> clarifications I outlined at the start of this mail).

Given clarifications, it would have to be modified, and if the
specification defined the exact situations where normalization is used
more clearly, it could probably be dropped alltogether.

> IRIs are not normalized at will. If they are normalized
> when transcoding from legacy encodings, then the problem
> is in the legacy encoding, because the legacy encoding isn't
> able to distinguish e.g. the two user names.
> In addition, as I have said, creating two user names that
> differ only by normalization is a security issue for whoever
> created these user names, not an issue for IRIs.

I'm not convinced by this.  Existing deployed solutions, like SASL,
supports all valid UTF-8 strings as user names.  There has been no
warnings presented that administrators should not pick user names that
only differ by normalization.  So if IRIs may arbitrary NFC strings,
it is IRIs that introduce a security problem that wasn't there before.
People would have a problem to start using IRIs without redesigning
their existing systems.

> Also, a system that performs authorization based on
> user name (alone) is a serious security issue, again not
> the fault of IRIs.

I assumed users are authenticated before the authorization process
starts, which is the case in the systems I'm familiar with.
Received on Thursday, 8 May 2003 19:13:06 UTC