Re: [URN] URI documents -- "# fragment"

Dan Connolly (connolly@w3.org)
Thu, 08 Jan 1998 03:07:03 -0600


Message-ID: <34B49737.7DEC@w3.org>
Date: Thu, 08 Jan 1998 03:07:03 -0600
From: Dan Connolly <connolly@w3.org>
To: "Sam X. Sun" <ssun@CNRI.Reston.VA.US>
CC: uri@bunyip.com, urn-ietf@bunyip.com
Subject: Re: [URN] URI documents -- "# fragment"

Sam X. Sun wrote:
> 
> > Sam Sun wrote:
> > > In the case of URL, The " [ "#" fragment ] " is only used or useful by
> some
> > > URL schemes. So my question is: is it acceptable to say that the
> fragment
> > > is scheme dependent, and don't bring it up in the URI definition?
> 
> Dan Said:
> >
> > No; that is, to say that is not consistent with current
> > implementations, and I would find it unacceptable.
> 
> The current implementation (eg. Netscape browser) append the "#fragment" to
> whatever
> the base URI is. I don't quite understand on where it would be
> inconsistent?

Uh... you said it yourself: "whatever the base URI is" regardless
of scheme.

Anyway... you report some interesting test results...

> Here is an example which I think doesn't honor the current '#' URI syntax:
> 
> If I define my password as "password_with_#_character", and use "ftp" URL:
> 
> ftp://my_user_id:password_with_#_character@myhost/my_file_path
> 
> Netscape browser implementation will pass the entire password (with #
> character in it) to the server, instead of sending only
> "ftp://user_id:password_with_" to the server.

Hmm... That's certainly different from what Roy's spec[1]
describes.

[1]
http://www.ics.uci.edu/~fielding/url/draft-fielding-uri-syntax-00.txt

According to the regexp in the spec, it parses as:

connolly@beach ../connolly[1005] perl uri.pl 
ftp://my_user_id:password_with_#_character@myhost/my_file_path
[ftp:] [ftp] [//my_user_id:password_with_] [my_user_id:password_with_]
[] [] [] [#_character@myhost/my_file_path]
[_character@myhost/my_file_path]

where uri.pl=
while(<>){
m,^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?,;
print "[$1] [$2] [$3] [$4] [$5] [$6] [$7] [$8] [$9]\n";
}


It's also different from the original implementation:

-------
http://www.w3.org/Library/src/HTParse.c

    /* Look for fragment identifier */
    if ((p = strchr(name, '#')) != NULL) {
	*p++ = '\0';
	parts->fragment = p;
    }
-------


> In fact, using %25 to replace
> the '#' character will fail.

That seems like a bug to me. But I suppose Draft Standard
is the time to describe what happens rather than prescribe
something else.

Hmm... the ftp URL spec[2] doesn't say that passwords
get %xx encoded. Seems to me it should; else there's
no way to express '/' in a password. I suppose that's
not a fatal limitation...

[2] http://ds.internic.net/internet-drafts/draft-casey-url-ftp-00.txt

> Dan Said:
> > For example, consider:
> >
> >       <p>...<a href="#foo">tail</a>
> >
> >       ...
> >
> >       <p><a name="foo">head</a>
> >
> > I can tell you where the link from tail goes (i.e. to head)
> > without knowing what URI scheme was used to access the document. So
> > can lots of implemented web clients (and maybe even some servers).
> >
> 
> The example will fail from the current Netscape implementation if no BASE
> URI is defined.

Wow... we worked really hard on this part of the HTML 2.0
spec:

========
Network Working Group                                    T. Berners-Lee
Request for Comments: 1866                                      MIT/W3C
Category: Standards Track                                   D. Connolly
http://www.w3.org/MarkUp/html-spec/html-spec_7.html#SEC7.4

Fragment Identifiers

Any characters following a `#' character in a hypertext address
constitute a fragment identifier. In particular, an address
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^
of the form `#fragment' refers to an anchor in the same document. 
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
========

But I guess rules were made to be broken. :-{

> (Refer the following URL for an example:
> http://ssun.cnri.reston.va.us/ietf/uri/nobase.htm and
> http://ssun.cnri.reston.va.us/ietf/uri/fragment.htm).

nobase.htm is illegal, per

=========
http://www.w3.org/MarkUp/html-spec/html-spec_5.html#SEC5.2.2

The optional BASE element provides a base address for interpreting
relative URLs when the document is read out of context (see section
Hyperlinks). The value of the HREF attribute must be an absolute URI. 

=========

so the behaviour of HTML user agents is unspecified.

> Regards,

Interesting stuff.

-- 
Dan
http://www.w3.org/People/Connolly/