- From: Simon St.Laurent <simonstl@simonstl.com>
- Date: Tue, 20 Jun 2000 02:09:43 -0400
- To: <XML-uri@w3.org>
In the section below, Henrik Frystyk Nielsen cites section 3.2.3 of the HTTP 1.1 specification, which has one of the clearest explanations I've found on how to compare URIs. At the same time, it raises a lot of questions about URI comparison in the context of XML parsing, and how much understanding of URIs is required for an XML parser to have an even close to reliable URI comparison algorithm. The process described below requires: - an understanding of protocol port numbers - an understanding of URI encoding - an understanding of which part of the URI is the hostname, and therefore case-insensitive Does all of this additional information really belong in an XML parser? I don't think so, though others seem to. At 09:41 PM 6/17/00 -0700, Henrik Frystyk Nielsen wrote: >You bring up a good point. For historic reasons, the comparison >algorithm is mentioned in the HTTP/1.1 spec, section 3.2.3 [1] where it >says > >***** > >When comparing two URIs to decide if they match or not, a client SHOULD >use a case-sensitive octet-by-octet comparison of the entire URIs, with >these exceptions: > > - A port that is empty or not given is equivalent to the default > port for that URI-reference; > > - Comparisons of host names MUST be case-insensitive; > > - Comparisons of scheme names MUST be case-insensitive; > > - An empty abs_path is equivalent to an abs_path of "/". > >Characters other than those in the "reserved" and "unsafe" sets (see RFC >2396 [42]) are equivalent to their ""%" HEX HEX" encoding. > >For example, the following three URIs are equivalent: > > http://abc.com:80/~smith/home.html > http://ABC.com/%7Esmith/home.html > http://ABC.com:/%7esmith/home.html > >***** > >The reason being that HTTP caching depends on being able to compare URIs >and it wasn't clear whether RFC 2396 would move forward in time for the >HTTP to move forward and so it was put in the HTTP spec. I have no >problem with it being moved to the URI spec but moving it to the >namespace spec I think makes no more sense than having it in the HTTP >spec. Simon St.Laurent XML Elements of Style / XML: A Primer, 2nd Ed. http://www.simonstl.com - XML essays and books
Received on Tuesday, 20 June 2000 02:07:23 UTC