Re: initial "relative-looking" elements.

Larry Masinter <masinter@parc.xerox.com> wrote:
>The enormous flap over internationalization left me with little
>time to deal with other issues. I don't quite understand where
>we want to go with some of the other issues.
>
>>         The rules for resolving partial/relative URLs since the
>> beginning of URL time have been such that if relative symbolic
>> elements end up at the beginning of paths they should be retained,
>> e.g., you can end up with something like:
>> 
>>         http://host/../foo/blah.html
>> 
>> but Netscape's parsing ends up stripping lead relative symbolic
>> elements yielding:
>> 
>>         http://host/foo/blah.html
>> 
>> with the consequence that many people are putting HREFs and SRCs
>> in their markup which by "valid" parsing rules yield lead
>> relative symbolic elements, and sending of "false bug reports"
>> to non-Netscape browser developers with one or another variant
>> of:
>> 
>>          "It works fine with Netscape."
>> 
>>         I can see retaining the lead relative symbolic elements
>> in ftp URLs for personal accounts (would generally fail for
>> anonymous accounts), but to my knowledge no http or https server
>> would accept such paths, so there's that kind of justification
>> what Netscape is doing.
>> 
>>         I would appreciate your and others' opinions on whether
>> it would be good or bad for other browsers to reverse engineer
>> for that Netscape URL resolving.
>> 
>>                                 Fote
>
>Was there any resolution of this issue?

	Since posting that question, I've received feedback that the
current versions of most browsers, not just Netscape and the current
version of MSIE, trim a lead relative symbolic element in the paths
for http/https requests.  My own predisposition, though, is to leave
the generic parsing rules as they presently are in the draft for that
matter, and treat the http/https problem as a implementation issue,
or as a special case for http/https (homologously to how things now
stand for the lead slash in ftp URLs).

	I tried leaving the formal parsing functions in Lynx compliant
with RFC1808 and/or the draft, but adding a 3 or so line hack of the
stream parser which, when a partial HREF or SRC value is "..", "../",
or "../whatever", checks whether the base has an http or https scheme
and lacks a path (i.e., has just the default '/'), and if so makes
those absolute, "doctored" paths ("/" or "/whatever") before passing
the base and HREF or SRC value to the formal parsing functions.  In
conjunction with that "doctoring", Lynx issues a statusline message
about the bad partial reference, so that there is immediate feedback
about it which might lead to it being corrected.  HREF or SRC values
such as "./../whatever" or "../../whatever", and values with absolute
paths or URLs that include lead relative symbolic elements, would still
end up with them after formal parsing/resolving.  This selective
"doctoring" seems to deal with all the bad partial references which
have become common in documents during the past year, I suspect, as
someone else suggested, due to a bug in some authoring tool, and thus
it seems best simply to track down that tool and suggest that the bug
be corrected.  I don't know if this "doctoring" hack will be
incorporated into a formal Lynx release, or if we'll just keep letting
the resolved http/https URLs fail, but the formal parsing functions
won't be changed unless the draft's rules are changed before it
becames an RFC.

	It's been made clear the new URL-WG is only for discussions
about the PROCESS of approving drafts for URLs, but no one has yet
answered the recurring question, most recently from Dan Connolly,
about which is(are) the proper forum(s) for discussing the SUBSTANCE
of the drafts.  It can be inferred that it's this forum of the
disbanded URI-WG, but could you or someone please answer the latter
question explicitly?

	Note that the www.alis.com server does accept /../foo.html
and /foo.html requests as equivalent, i.e., as if a "map /../* /*"
rule has been added to its configuration file, perhaps to "help"
browsers which don't strip it themselves.  It's hard to image that
the server is sending such a path in the browser's request directly
to the filesystem, because, for most server's, the "root" for its
"data tree" need not be a filesystem root, and this would pose a
serious security problem (as would /~user/../whatever).

	The consequence of this "help", if that's what it is,
unfortunately is that the lead relative symbolic element ends up
retained in all the subsequently resolved URLs, is thus likely to
be seen by the user in some window for displaying URLs, end up in
bookmarks, etc., and in the long run exacerbate the problem rather
than "helping", IMHO.

				Fote

=========================================================================
 Foteos Macrides            Worcester Foundation for Biomedical Research
 MACRIDES@SCI.WFBR.EDU         222 Maple Avenue, Shrewsbury, MA 01545
=========================================================================

Received on Monday, 28 April 1997 11:23:39 UTC