- From: Sam X. Sun <ssun@CNRI.Reston.VA.US>
- Date: Fri, 23 Jan 1998 01:46:02 -0500
- To: "Roy T. Fielding" <fielding@kiwi.ics.uci.edu>
- Cc: <uri@Bunyip.Com>, <urn-ietf@Bunyip.Com>
[modified and reposted, since it bounced last week,] Hello, Roy. > >The point I wanted to show you is that "# fragment" doesn't work by > >itself. It's actually worked as a relative URL. And the generic URI > >parser may never get the "# fragment" alone. (ie, in your > >example, <a href="#foo">.... is a relative URL, not just a "# fragment".) > > I seem to be having a hard time getting this point across. The generic > URI parser *is* the thing that takes a string and does the handling > and interpretation needed to > > 1) determine whether it is absolute or relative > 2) convert it to absolute form if needed > 3) give the resulting URI to the scheme-specific handler > > There is no purpose for a generic URI syntax beyond that. Likewise, > it is only that syntax which is needed by other protocols as a > Draft Standard reference. > I believe we are in agrement here... > >On the other hand, I don't see any usage of "# fragment" for "mailto" or > >"ldap" URLs as defined in the HTML document. So, if "# fragment" is not > >needed for all of the URI schemes, I wonder if we could drop it from the > >overall URI definition? > > Because you cannot do so and produce an interoperable parser. > I doubt if I understand the whole issue here. But would you think the following would be ok for the generic URI parser, which basically allow "#...." to be treated by individual URI scheme handlers accordingly: 1) determine whether it is absolute or relative 2) convert it to absolute form if needed 3) give the resulting URI entirely (ie, including the "#......." trailing), to the corresponding URI scheme-specific handler, which may then decide whether to use "#fragment" or not. > >Lastly, I'm wondering if the "# fragment" requirement is inherited from > >the earlier URL standards when there're few URL schemes defined. If > >we drop the requirement of "# fragment" from URI as a whole, it can > >still be defined by those URL schemes that need it, in their respective > >RFCs. And the only thing I see broken is that the generic URI parser > >can not catch the "#fragment", and decide what to do, which is not > >happening and I think really doesn't have to. > > The "#fragment" is removed from the URI whether the URI is defined > to use it or not. Why it has to do this? > Other applications allow the user to pass unknown URI schemes > to a proxy for resolution, and on those systems you will find that > the "#fragment" is stripped before being sent to the proxy. Yes, indeed. I wound conclude then, under current situation, there are browsers that pass the URI with the '#fragment' to the URI scheme-specific handlers, which then decide what to do with it. And there are also browsers that strip off '#fragment' regardless of the URI scheme, before passing the URI to the scheme-specific handlers. > It is therefore IMPOSSIBLE for "#" to be used as anything else in the > URI syntax and still retain interoperability between new and deployed > systems. > > There is very little room for discussion of what is being defined by > the specification and in the syntax itself, since that is governed by > the most interoperable subset of what is implemented. I think you are saying that this is a backward compatability issue then. The question would then be: When we define a standard, do we have to make sure it works for all the current implementations (hense to take the subset), or should we define the standard with a more emphysis on the user usability and future extensibility? I understand the former is very important, and break it will affect the usability. But the way major browsers making their new releases makes me think that the later might weight more, since any software can be patched or updated relatively easy, but the standard tends to stay longer and have a far more impact to the future. The way current URI parser cuts off '#fragment' regardless of the URI scheme makes it not very user friendly, because if user A has his userid or password containing '#' character, he will have to use '%22' when he reach his ftp server from web browser, even though '#fragment' doesn't make sense in 'ftp' URL. Similar issue holds true for 'telnet', 'mailto', as well. Another example is when we are working with publishers, there are existing naming schemes, like SICI, uses '#' extensively, and it's just not very practical to enforce every SICI names to be hex encoded. An example that URI parser cuts off '#fragment' regardless of the URI scheme makes it less extensible is pointed out by John earlier. That is, when a new scheme like 'pdi' is defined, it can not use '#' to define its own fragment and have it processed differently from "http" URI, simply because URI parser assumes the 'http' behavior and chops the '#fragment' off, and didn't parse the '#fragment' to the server. In summary, all I'm suggesting is that '#fragment' should be processed by individual scheme parser, not the URI parser. In terms of libwww, it should be handled by individual 'plug-in' module (eg, http), but not in the Core portion (ie. HTParse.c). PS. I assume that we are all in agreement that each URI scheme can define by itself whether to use or do anything about '#fragment' or not... > The only question still to be determined is whether we call these > things URI or URL, and thus whether or not a URN should be referred > to as a URI or a URL when it is used by HTTP, HTML, XML, etc. I'm also having some questions on these issues too. But I think it would help me a lot to address the '#fragment' question first, and make sure I'm on the same boat as you are...... Regards, Sam
Received on Friday, 23 January 1998 01:59:14 UTC