Re: [Fwd: Re: [xml-dev] creating a URI class]

At 07:05 PM 2002-02-19 , Ted Hardie wrote:
>In other words, the URI spec seems to say that semantic equivalence is
>scheme specific and string comparison alone is not enough.
>

For the N-teenth time, RFC-2396 does not define a general rule for 'semantic
equivalence' of URIs.  It does define a specific equivalence of relative URIs
to absolute.

Dan may be convinced that string equality is the only equivalence that should
be regarded as pertinent, but that is a personal opinion without basis in the
governing writings.

The case-insensitivity of certain substrings of URIs is well known and
normatively documented.  It is of broad interest to many processors of URIs. 
The construction and use of a class which encapsulates this knowledge is a
reasonable division of concerns in software construction.

What subset of URI-applicable knowledge a processing environment chooses to
encapsulate in a class is a matter for that processing environment to decide,
so long as it does not presume knowledge (such as that string inequality
implies resource identity difference) which is un-founded in the shared lore of
normative writing.

It is certainly true that string equality is applicable, in that they are all
strings.  And string equality implies without question that the two strings
identify the same resource [or both of them alike identify no resource].

But string equality is not a necessary condition for two URIs to be known from
interpreting their text value to be equivalent.  A URI belongs to a scheme, and
the scheme is allowed to add semantic rules such as that this part is a DNS
domain name.  And that imports the knowledge that DNS names are
case-insensitive.  So a URI-processor which handles URIs in terms of a class
which equates URI strings that differ only in DNS-name-case is on solid
ground.  It's more work, but it's all implied by the governing specifications.

Neither the class using string-equal as defining equivalence groups nor the
class which collapses matching DNS names in recognition of the
case-insensitivity of DNS has a basis in RFC-2396 to claim to be _the_ URI
class.  Either class fits, and neither of these two particular "equal" results
will find _all_ cases where the two unequal strings are demonstrable from
governing knowledge to be referring to the same thing.  For example there are
newsgroups in 'news' URLs, which are also case-insensitive.

Al

>Dan writes> 
>> URIs are character sequences; they're equal when they
>> have the same characters an unequal when they don't.
>> i.e. strcmp() is necessary and sufficient for comparing URIs.
>> 
>> There are cases when the URI spec guarantees that two URIs
>> point to the same thing; e.g.
>> <http://www.w3.org/>http://www.w3.org/
>> and
>> <http://www.w3.org/>http://WWW.W3.ORG/
>> 
>> but that doesn't make the two URIs equal. If you want
>> to be sure that consumers realize you mean the same thing,
>> I recommend writing it the same way, rather than relying
>> on consumers to do scheme-specific equivalence processing.
>
>This seems to contradict to RFC 2396, sections 2.1 and 3.
>3, in particular, says:
>
>   The URI syntax does not require that the scheme-specific-part have
>   any general structure or set of semantics which is common among all
>   URI.  
>
>I read that to mean that <http://www.bar.org/>http://www.bar.org/ and
<http://www.bar.org/>http://www.BAR.org/
>may be equivalent where <file://\\123abcdef\>file://123abcdef and
<file://\\123ABCDEF\>file://123ABCDEF might
>not be, and that a reference to the definition of the http and file
>URI schemes would be required to determine which semantics need be
>applied.
>
>In other words, the URI spec seems to say that semantic equivalence is
>scheme specific and string comparison alone is not enough.
>
> regards,
> Ted Hardie
>  

Received on Friday, 22 February 2002 08:58:13 UTC