Re: Relative URL draft 04 from Roy T. Fielding on 1995-01-22 (uri@w3.org from January 1995)

From: Roy T. Fielding <fielding@avron.ICS.UCI.EDU>
Date: Sun, 22 Jan 1995 08:40:05 -0800
To: uri@bunyip.com
Message-Id: <9501220840.aa06168@paris.ics.uci.edu>
Larry Masinter <masinter@parc.xerox.com> writes:

> I'm wondering if section 3 ("establishing a Base") should be an
> appendix along with section 10. "Any protocol that wishes to allow
> relative URLs must define the method by which the base is determined;
> here's how to do it for X, Y, and Z" would characterize the appendix.

No.  The method of establishing a Base must be part of the standard,
and thus be in the body of the spec.  I could add a note about non-RFC 822
protocols in section 3.2.

> I don't think the method you outlined for multipart messages in
> section 3.5 is clear, or at least, I didn't understand it, or maybe if
> I have
>    multipart/mixed
>      part1: html
>      part2: multipart/mixed
>          part2a: html
>          part2b: text
>          part2c: gif
>     
> should I be able to say in part2a.html "../part1"? ...

There is no easy way to describe it without repeating the entire
MIME syntax for multipart and message.  Basically, the "retrieval context"
of a component part is the base URL of whatever entity encapsulates it.
Thus, the base URL of part2a is, in order:

     a) the embedded BASE in the body-part of part2a, or
     b) the Base: header value in the header-part of part2a, or
     c) the Base: header value in the header-part of part2, or
     d) the Base: header value in the header-part of top "multipart/mixed", or
     e) the URL used to retrieve the topmost "multipart/mixed".

> ... Or do you just use
> the content-IDs? I actually think that if we want to support hypertext
> in multipart mail we probably should reintroduce "cid:xxxx" as a URL
> and translate all of the references to refer to content-ID headers in
> the multipart itself, and not try to handle relative URLs in that
> situation. 

Content-IDs are a separate issue.  Yes, I think they should be defined.
However, they are no substitute for relative URLs, since the enclosed
objects are often independent of the MIME packing mechanism.  Replacing
all relative references with cid:'s is not an option.

Another issue is whether or not it would be a "good idea" to define a
standard URL parameter for referencing a single part within a multipart
entity.  I can see many uses for such a parameter, e.g. "X;part=2.3" to
refer to the third body-part of the second multipart of X.  In fact,
one company has already asked me to define it for relative URLs, but
I think the issue applies to URLs in general (relative URLs would pick
it up automatically).  This issue is orthogonal to Content-IDs -- there
is no reason why we can't define both.

> -----
> A style preference: when you say "have a defined URL syntax in [2]"
> I'd rather see "have a defined URL syntax in RFC 1738 [2]" if the
> referenced document has a well-known descriptor like that.

okay

> -----
> 
>>   Relative URLs can be used with these schemes whenever the applicable
>>   base URL follows the generic syntax.
> 
>>      gopher     Gopher and Gopher+ Protocols
>>      news       USENET news
>>      nntp       USENET news using NNTP access
>>      prospero   Prospero Directory Service
>>      wais       Wide Area Information Servers Protocol
> 
> 
> I don't think news and nntp belong in there, do they?

Yes.  There is no reason why they can't use relative URLs when listing
the available groups and/or articles.

> And you might
> want a footnote about gopher URLs being broken because of the prefix
> type code 00.

okay

> I'm not sure about them being employed in wais (maybe
> they're useful if documents in a wais set refer to other documents at
> the same level). I don't know about prospero at all.

It does not matter.  The reason we switched to the generic syntax
is so that we don't have to restrict rURLs by scheme.  If they are
not useful, they simply will not be used.

> I still think "the generic syntax" is too 'generic' a phrase for what
> you're talking about, and will confuse the reader who picks up an RFC
> and scans a section of it rather than reading the whole thing. I'd
> really like something that clearly denotes a 'syntax defined in this
> document'.  You didn't like "relative-compatible syntax", I don't
> think, but perhaps there's some other terminology that meets our
> requirements here?

I have changed it to generic-RL syntax and made the references more
explicit.

The diffs below show what I intend to change for the next draft. I do not
want to generate more than one additional draft before making the last call.
So, if anyone wants additional changes before the last call, ask for them
now!

......Roy Fielding   ICS Grad Student, University of California, Irvine  USA
                                     <fielding@ics.uci.edu>
                     <URL:http://www.ics.uci.edu/dir/grad/Software/fielding>
============================================================================
*** draft-ietf-uri-relative-url-04.txt  Wed Jan 18 08:58:44 1995
--- rurl-spec.txt       Sun Jan 22 08:00:57 1995
***************
*** 101,112 ****
     Although this document does not seek to define the overall URL
     syntax, some discussion of it is necessary in order to describe the
     parsing of relative URLs.  In particular, base documents can only
!    make use of relative URLs when their base URL fits within the generic
!    syntax described below.  Although some URL schemes do not require
!    this generic syntax, it is assumed that any document which contains
!    a relative reference does have a base URL that obeys the syntax.
!    In other words, relative URLs cannot be used within documents that
!    have unsuitable base URLs.
  
  2.1.  URL Syntactic Components
  
--- 101,112 ----
     Although this document does not seek to define the overall URL
     syntax, some discussion of it is necessary in order to describe the
     parsing of relative URLs.  In particular, base documents can only
!    make use of relative URLs when their base URL fits within the
!    generic-RL syntax described below.  Although some URL schemes do not
!    require this generic-RL syntax, it is assumed that any document which
!    contains a relative reference does have a base URL that obeys the
!    syntax.  In other words, relative URLs cannot be used within
!    documents that have unsuitable base URLs.
  
  2.1.  URL Syntactic Components
  
***************
*** 114,121 ****
     reserved characters like "?" and ";" to indicate special components,
     while others just consider them to be part of the path.  However,
     there is enough uniformity in the use of URLs to allow a parser
!    to resolve relative URLs based upon a single, generic syntax.
!    This generic syntax consists of six components:
  
        <scheme>://<net_loc>/<path>;<params>?<query>#<fragment>
  
--- 114,121 ----
     reserved characters like "?" and ";" to indicate special components,
     while others just consider them to be part of the path.  However,
     there is enough uniformity in the use of URLs to allow a parser
!    to resolve relative URLs based upon a single, generic-RL syntax.
!    This generic-RL syntax consists of six components:
  
        <scheme>://<net_loc>/<path>;<params>?<query>#<fragment>
  
***************
*** 123,139 ****
     These components are defined as follows (a complete BNF is provided
     in Section 2.2):
  
!       scheme ":"   ::= scheme name, as per Section 2.1 of [2].
  
        "//" net_loc ::= network location and login information, as per
!                        Section 3.1 of [2].
  
!       "/" path     ::= URL path, as per Section 3.1 of [2].
  
        ";" params   ::= object parameters (e.g. ";type=a" as in 
!                        Section 3.2.2 of [2]).
  
!       "?" query    ::= query information, as per Section 3.3 of [2].
  
        "#" fragment ::= fragment identifier.
  
--- 123,140 ----
     These components are defined as follows (a complete BNF is provided
     in Section 2.2):
  
!       scheme ":"   ::= scheme name, as per Section 2.1 of RFC 1738 [2].
  
        "//" net_loc ::= network location and login information, as per
!                        Section 3.1 of RFC 1738 [2].
  
!       "/" path     ::= URL path, as per Section 3.1 of RFC 1738 [2].
  
        ";" params   ::= object parameters (e.g. ";type=a" as in 
!                        Section 3.2.2 of RFC 1738 [2]).
  
!       "?" query    ::= query information, as per Section 3.3 of
!                        RFC 1738 [2].
  
        "#" fragment ::= fragment identifier.
  
***************
*** 159,166 ****
  
     URL         = ( absoluteURL | relativeURL ) [ "#" fragment ]
  
!    absoluteURL = scheme ":" *( uchar | reserved )
  
     relativeURL = net_path | abs_path | rel_path
  
     net_path    = "//" net_loc [ abs_path ]
--- 160,169 ----
  
     URL         = ( absoluteURL | relativeURL ) [ "#" fragment ]
  
!    absoluteURL = generic-RL | ( scheme ":" *( uchar | reserved ) )
  
+    generic-RL  = scheme ":" [ relativeURL ]
+ 
     relativeURL = net_path | abs_path | rel_path
  
     net_path    = "//" net_loc [ abs_path ]
***************
*** 208,214 ****
  2.3.  Specific Schemes and their Syntactic Categories
  
     Each URL scheme has its own rules regarding the presence or absence
!    of the syntactic components described in Section 2.1 and 2.2.
     In addition, some schemes are never appropriate for use with relative
     URLs.  However, since relative URLs will only be used within contexts
     in which they are useful, these scheme-specific differences can be
--- 211,217 ----
  2.3.  Specific Schemes and their Syntactic Categories
  
     Each URL scheme has its own rules regarding the presence or absence
!    of the syntactic components described in Sections 2.1 and 2.2.
     In addition, some schemes are never appropriate for use with relative
     URLs.  However, since relative URLs will only be used within contexts
     in which they are useful, these scheme-specific differences can be
***************
*** 215,230 ****
     ignored by the resolution process.
  
     Within this section, we include as examples only those schemes that
!    have a defined URL syntax in [2].  The following schemes are never
!    used with relative URLs:
  
        mailto     Electronic Mail
        telnet     TELNET Protocol for Interactive Sessions
  
     Some URL schemes allow the use of reserved characters for purposes
!    outside the generic grammar given above.  However, such use is rare.
!    Relative URLs can be used with these schemes whenever the applicable
!    base URL follows the generic syntax.
  
        gopher     Gopher and Gopher+ Protocols
        news       USENET news
--- 218,233 ----
     ignored by the resolution process.
  
     Within this section, we include as examples only those schemes that
!    have a defined URL syntax in RFC 1738 [2].  The following schemes are
!    never used with relative URLs:
  
        mailto     Electronic Mail
        telnet     TELNET Protocol for Interactive Sessions
  
     Some URL schemes allow the use of reserved characters for purposes
!    outside the generic-RL syntax given above.  However, such use is
!    rare.  Relative URLs can be used with these schemes whenever the
!    applicable base URL follows the generic-RL syntax.
  
        gopher     Gopher and Gopher+ Protocols
        news       USENET news
***************
*** 232,253 ****
        prospero   Prospero Directory Service
        wais       Wide Area Information Servers Protocol
  
!    Finally, the following schemes can always be parsed using the generic
!    syntax.
  
        file       Host-specific Files
        ftp        File Transfer Protocol
        http       Hypertext Transfer Protocol
  
     It is recommended that new schemes be designed to be parsable via
!    the generic syntax if they are intended to be used with relative
     URLs.  A description of the allowed relative forms should be included
!    when a new scheme is registered, as per Section 4 of [2].
  
  2.4.  Parsing a URL
  
!    An accepted method for parsing URLs is necessary to disambiguate the
!    generic URL syntax of Section 2.2 and to describe the algorithm for
     resolving relative URLs presented in Section 4.  This section
     describes the parsing rules for breaking down a URL (relative or
     absolute) into the component parts described in Section 2.1.  The
--- 235,261 ----
        prospero   Prospero Directory Service
        wais       Wide Area Information Servers Protocol
  
!    Users of gopher URLs should note that gopher-type information is
!    often included at the beginning of what would be the generic-RL path.
!    If present, this type information prevents relative-path references
!    to documents with differing gopher-types.
  
+    Finally, the following schemes can always be parsed using the
+    generic-RL syntax.
+ 
        file       Host-specific Files
        ftp        File Transfer Protocol
        http       Hypertext Transfer Protocol
  
     It is recommended that new schemes be designed to be parsable via
!    the generic-RL syntax if they are intended to be used with relative
     URLs.  A description of the allowed relative forms should be included
!    when a new scheme is registered, as per Section 4 of RFC 1738 [2].
  
  2.4.  Parsing a URL
  
!    An accepted method for parsing URLs is useful to clarify the
!    generic-RL syntax of Section 2.2 and to describe the algorithm for
     resolving relative URLs presented in Section 4.  This section
     describes the parsing rules for breaking down a URL (relative or
     absolute) into the component parts described in Section 2.1.  The
***************
*** 356,361 ****
--- 364,377 ----
     Any whitespace (including that used for line folding) inside the
     angle brackets should be ignored.
  
+    Protocols which do not use the RFC 822 message header syntax, but
+    which do allow some form of tagged metainformation to be included
+    within messages, may define their own syntax for passing the base URL
+    as part of a message.  Defining the syntax for all possible protocols
+    is beyond the scope of this document.  It is assumed that user agents
+    using such a protocol will be able to obtain the appropriate syntax
+    from that protocol's specification.
+ 
     In situations where both an embedded base URL (as described in
     Section 3.1) and a "Base" message header are present, the embedded
     base URL takes precedence.
***************
*** 388,394 ****
     For these types, the base URL of the composite entity should be
     determined first; this base is then considered the default for any
     component part that does not define its own base via one of the
!    methods described in Sections 3.1 and 3.2.
  
  4.  Resolving Relative URLs
  
--- 404,412 ----
     For these types, the base URL of the composite entity should be
     determined first; this base is then considered the default for any
     component part that does not define its own base via one of the
!    methods described in Sections 3.1 and 3.2.  Thus, a multipart entity
!    defines a hierarchy of retrieval context from which the base URL of
!    each part can be obtained.
  
  4.  Resolving Relative URLs
Received on Sunday, 22 January 1995 11:47:52 UTC