Re: "canonical" URIs

Tim Bray wites:

> > At 04:00 PM 21/03/02 -0500, Norman Walsh wrote:

> > ...

> >I think a more reasonable approach is to say that the default
> >comparison function is lexicographic identity. 

> I think I agree with Norm. 

+1 (as we say in the protocols WG), I agree. 

Another reason to stick with lexicographic identity:  regardless of 
architectural merit, the anyURI type is part of a published recommendation 
[1], and its equality rules are effectively lexicographic. 

Indeed, the schema datatypes recommendation adopts a universal 
architectural principal that equality is identity in the value space [2]. 
Therefore, the only way to have case-independent comparison for purposes 
of that recommendation would be to fold upper- and lower-case lexical 
forms to a single value in the value space.  (I.e. the same way we fold 
the lexical forms "003", "03", and "3" to the same integer 3 for the 
integer data type.)  Such folding would strongly signal that case is never 
a significant in an anyURI, which I think is contrary to the scheme-based 
rules that apply on the Web.  In any case, that is not how the anyURI type 
works in the recommendation.

Note that the only formal applications of equality in the recommendation 
are (a) for enumerations (e.g. integer "003" matches an enumeration of 
"3") [3] and (b) when comparing keys in the schema structures 
specification [4].  Nothing prevents applications or other software built 
on top of the datatypes from applying their own notions of the quality for 
one purpose or another.  Thus, your application may choose to do 
case-independent comparison of anyURI's, but the recommendation itself is 
case-sensitive.

So, that is a bit more ammunition to back up your position.

[1] http://www.w3.org/TR/xmlschema-2/#anyURI
[2] http://www.w3.org/TR/xmlschema-2/#equal
[3] http://www.w3.org/TR/xmlschema-2/#dt-enumeration
[4] http://www.w3.org/TR/xmlschema-1/#section-Identity-constraint-Definition-Validation-Rules

------------------------------------------------------------------
Noah Mendelsohn                              Voice: 1-617-693-4036
IBM Corporation                                Fax: 1-617-693-8676
One Rogers Street
Cambridge, MA 02142
------------------------------------------------------------------







Tim Bray <tbray@textuality.com>
Sent by: www-tag-request@w3.org
03/25/2002 02:01 PM

 
        To:     www-tag@w3.org
        cc:     (bcc: Noah Mendelsohn/Cambridge/IBM)
        Subject:        Re: "canonical" URIs


At 04:00 PM 21/03/02 -0500, Norman Walsh wrote:
>/ "David Orchard" <david.orchard@bea.com> was heard to say:
>|| anyURI data type.  I think a reasonable approach would be to say that 
the
>| default comparision function for anyURI is to use the HTTP URI 
comparison
>| algorithm, but that it is overridable by any scheme.
>
>I think a more reasonable approach is to say that the default
>comparison function is lexicographic identity. 

I think I agree with Norm.  It's easy, it's cheap, and
since people are highly incented to avoid false negatives,
it's reasonable to expect caution in users. -Tim

Received on Monday, 25 March 2002 15:18:00 UTC