Re: [URN] URI documents -- "# fragment" (2)

Sam X. Sun (ssun@CNRI.Reston.VA.US)
Fri, 23 Jan 1998 01:46:02 -0500

Message-Id: <199801230648.BAA02655@newcnri.CNRI.Reston.Va.US>
From: "Sam X. Sun" <ssun@CNRI.Reston.VA.US>
To: "Roy T. Fielding" <>
Cc: <uri@Bunyip.Com>, <urn-ietf@Bunyip.Com>
Date: Fri, 23 Jan 1998 01:46:02 -0500
Subject: Re: [URN] URI documents -- "# fragment" (2)

[modified and reposted, since it bounced last week,]

Hello, Roy.

> >The point I wanted to show you is that "# fragment" doesn't work by
> >itself. It's actually worked as a relative URL. And the generic URI 
> >parser may never get the "# fragment" alone. (ie, in your 
> >example, <a href="#foo">.... is a relative URL, not just a "#
> I seem to be having a hard time getting this point across.  The generic
> URI parser *is* the thing that takes a string and does the handling
> and interpretation needed to
>    1) determine whether it is absolute or relative
>    2) convert it to absolute form if needed
>    3) give the resulting URI to the scheme-specific handler
> There is no purpose for a generic URI syntax beyond that.  Likewise,
> it is only that syntax which is needed by other protocols as a
> Draft Standard reference.

I believe we are in agrement here...

> >On the other hand, I don't see any usage of "# fragment" for "mailto" or
> >"ldap" URLs as defined in the HTML document. So, if "# fragment" is not
> >needed for all of the URI schemes, I wonder if we could drop it from the
> >overall URI definition? 
> Because you cannot do so and produce an interoperable parser.

I doubt if I understand the whole issue here. But would you think the
following would be ok for the generic URI parser, which basically allow
"#...." to be treated by individual URI scheme handlers accordingly:

1) determine whether it is absolute or relative

2) convert it to absolute form if needed

3) give the resulting URI entirely (ie, including the "#......." trailing),
to the corresponding URI scheme-specific handler, which may then decide
whether to use "#fragment" or not.

> >Lastly, I'm wondering if the "# fragment" requirement is inherited from 
> >the earlier URL standards when there're few URL schemes defined. If 
> >we drop the requirement of "# fragment" from URI as a whole, it can 
> >still be defined by those URL schemes that need it, in their respective 
> >RFCs. And the only thing I see broken is that the generic URI parser 
> >can not catch the "#fragment", and decide what to do, which is not 
> >happening and I think really doesn't have to.
> The "#fragment" is removed from the URI whether the URI is defined
> to use it or not.  

Why it has to do this?

> Other applications allow the user to pass unknown URI schemes
>  to a proxy for resolution, and on those systems you will find that 
>  the "#fragment" is stripped before being sent to the proxy.  

Yes, indeed. I wound conclude then, under current situation, there are
browsers that pass the URI with the '#fragment' to the URI scheme-specific
handlers, which then decide what to do with it. And there are also browsers
that strip off '#fragment' regardless of the URI scheme, before passing the
URI to the scheme-specific handlers.

> It is therefore IMPOSSIBLE for "#" to be used as anything else in the
> URI syntax and still retain interoperability between new and deployed
> systems.
> There is very little room for discussion of what is being defined by
> the specification and in the syntax itself, since that is governed by
> the most interoperable subset of what is implemented.  

I think you are saying that this is a backward compatability issue then.

The question would then be: When we define a standard, do we have to 
make sure it works for all the current implementations (hense to take 
the subset), or should we define the standard with a more emphysis on 
the user usability and future extensibility? I understand the former is 
very important, and break it will affect the usability. But the way major 
browsers making their new releases makes me think that the later 
might weight more, since any software can  be patched or updated 
relatively easy, but the standard tends to stay longer and have a far 
more impact to the future.

The way current URI parser cuts off '#fragment' regardless of the 
URI scheme makes it not very user friendly, because if user A 
has his userid or password containing '#' character, he will have to 
use '%22' when he reach his ftp server from web browser, even 
though '#fragment' doesn't make sense in 'ftp' URL. Similar issue 
holds true for 'telnet', 'mailto', as well. Another example is when we 
are working with publishers, there are existing naming schemes, 
like SICI, uses '#' extensively, and it's just not very practical to 
enforce every SICI names to be hex encoded.

An example that URI parser cuts off '#fragment' regardless of the URI 
scheme makes it less extensible is pointed out by John earlier. 
That is, when a new scheme like 'pdi' is defined, it can not use '#' to 
define its own fragment and have it processed differently from "http" 
URI, simply because URI parser assumes the 'http' behavior and 
chops the '#fragment' off, and didn't parse the '#fragment' to the server.

In summary, all I'm suggesting is that '#fragment' should be processed 
by individual scheme parser, not the URI parser. In terms of libwww, it 
should be handled by individual 'plug-in' module (eg, http), but not in the

Core portion (ie. HTParse.c).

I assume that we are all in agreement that each URI scheme can define
by itself whether to use or do anything about '#fragment' or not...

> The only question still to be determined is whether we call these
> things URI or URL, and thus whether or not a URN should be referred 
> to as a URI or a URL when it is used by HTTP, HTML, XML, etc.

I'm also having some questions on these issues too. But I think it would
help me a lot to address the '#fragment' question first, and make sure I'm
on the same boat as you are......