[Bug 4665] Clarify URI equivalence in reference to RFC 3986

http://www.w3.org/Bugs/Public/show_bug.cgi?id=4665





------- Comment #2 from kumarp@microsoft.com  2007-09-20 03:30 -------
The current definition is based on the following proposal sent to the WG
earlier. The only change is that the current definition uses case sensitive
comparison instead.

Proposal: 
Uri equivalence in SML-IF should be defined as case insensitive simple string
comparison based on codepoint-by-codepoint comparison of the corresponding
characters in the uri. 

Justification: 
1.        Performance: Simple string comparison provides highest performance.
Although it is true that two aliases of the same uri may not compare as equal
without normalization, the problem does not exist in the specific context of an
SML-IF producer. This is because, when a producer is writing out an SML-IF
document, it can apply normalizations (if necessary) such that a given uri
always appears in the same way. This allows consumers to perform fast string
comparison without needing to perform any type of normalization. 

RFC 3986 section 2 (Comparison Ladder) describes many different forms of
normalizations
(syntax-based/case/percent-encoding/path-segment/scheme-based/protocol-based).
If we want a consumer to perform normalizations, we not only make a consumer
less efficient but also need to add very specific normalization step
definitions in the SML-IF spec. On the other hand, if we leave the burden of
normalization to the producer, we can keep the SML-IF spec much simpler and
allow consumers to be more efficient. This way the spec does not need to talk
about any specific comparison ladder step(s) to be performed by a producer. The
producer is free to apply any (or none) normalization steps as long as it knows
it will write a given uri in the same format. 
2.        Precise definition: RFC 3986 section 6.2.1 (Simple String Comparison)
discusses issues involved in performing a string comparison but does not
provide a precise definition of how the comparison must be performed. In other
words, it leaves some room for interpretation. We should avoid this by
presenting an unambiguous definition based on that discussion. 

Received on Thursday, 20 September 2007 03:30:38 UTC