W3C home > Mailing lists > Public > uri@w3.org > February 2005

Re: News and nntp URI schemes

From: Charles Lindsey <chl@clerew.man.ac.uk>
Date: Thu, 17 Feb 2005 16:59:48 -0000
To: "Frank Ellermann" <nobody@xyzzy.claranet.de>
Message-ID: <opsmcphylf6hl8nm@clerew.man.ac.uk>
On Tue, 04 Jan 2005 21:33:27 +0100, Frank Ellermann
<nobody@xyzzy.claranet.de> wrote:

> Charles Lindsey wrote:
>
>> the 'official' "@" in the <message-id> MUST NOT be %encoded
>> (because it is a delimiter, and should be declared to
>> be reserved
>
> Yes, that makes sense, so for the news: URL scheme we have two
> reserved characters "/" and "@", and for nntp only "/", ready.

I have been reading, and re-reading, and re-re-reading RFC 3986, and I
think I have finally sussed it out. Essentially, an IRI consists of a
<scheme>, an <authority>, a <path>, a <query> and a <fragment>. What we
are doing is to provide something which satisfies the syntax for the RFC
3986 definition of a <path>. Although our scheme makes no provision for a
<query> or a <fragment>, future extensions might do so, so we should
reserve '?' and '#' just in case (anyway, the syntax of <path> requires,
as does the RE in Appendix B which should be capable of dissecting ANY URI
into its basic 5 components.

So what I have now written is:

     Within a <printable-ascii> and a <newsgroup-name>, the characters
     '%', '@', '/', '?' and '#' are reserved and MUST be %-encoded if they
     occur. All other characters MAY be used freely to represent
     themselves. It is not precluded that future extensions to the Netnews
     standard may permit octets outside of the given ranges, in which case
     they too MUST be %-encoded (except perhaps when used in an IRI [RFC
     3987]).

Note that RFC 3986 also attempts to reserve '[' and ']' within a <path>,
although they never have any delimiting meaning after the <authority>, and
they are not forbidden by the RE in Appendix B. I think that is a bug in
RFC 3986, and so I have not reserved them for our case.

> The ftp draft says:
>
> | Within a name or CWD component, the characters "/" and ";"
> | are reserved and must be encoded

I suspect it should have reserved '?' and '#' too, for RFC 3986 compliance.

>
>>> | Note that user agents may extend the ability to refer to
>>> | groups by use of "*" as a string wild-card.
>
>> Then you would be allowing "wildmats" as defined in the NNTP
>> draft. That might be workable, but does anyone anywhere
>> inplement that?
>
> No idea, it's just an elegant way to keep the similar RfC 1738
> oddity somewhere without explicitly saying that it's dead.

I think we either have to deprecate it entirely, or go the whole hog and
turn it into a <wildmat>. Currently, of all the possibilities for
specifying <all-groups>, none of them works in all current servers, and
each of them fails to work in some current server, so we are damned
whatever we do. What is for sure is that _nobody_ implements <wildmat>s
currently.

So I have two alternative texts:

2.3  The newsURI contains an <all-groups>

     If the newsURI is of one of the following forms:
        <URI:news:*>
        <URI:news://news.example.com/*>
        <URI:news://news.example.com/>
        <URI:news://news.example.com>
     it refers to "all available news groups".  The resource retrieved by
     this URI is some means to gain access to all the newsgroups that are
     available from the given <authority> (usually by invoking a suitable
news
     reading agent).

[Issue: Do we really want all those forms? Only the first was in [RFC
1738], but many agents currently accept the others. Moreover, some
agents are known to barf on anything with '*' in it. Maybe the '*' part
of the notation should be dispensed with. I therefore offer two
alternative formulations.]

[1st alternative]

        all-groups  = news-server [ "/" ] / <empty>

     The possibility for <all-groups> to consist of a "*", which was
     present in [RFC 1738] is now obsoleted, and its continued use is
     deprecated. It was, in any case, only patchily implemented.

[That allows the following forms:
        <URI:news:>
        <URI:news://news.example.com/>
        <URI:news://news.example.com>
of which the first may or may not already work on current
implementations (but that is true of the others also).]

[2nd alternative]

        newsURI     = "news:" ( article / group )
        article     = [ news-server "/" ] message-id
        group       = [ news-server "/" ] wildmat

[where <wildmat> is defined in draft-ietf-nntpext-base-*.txt and would
allow the following forms:
        <URI:news:*>
        <URI:news:comp.*>
        <URI:news:*.test>
        <URI:news://news.example.com/*>
        <URI:news://news.example.com/comp.*>
        <URI:news://news.example.com/*.test>

this is an enhancement of draft-gilman-news-url-02.txt and preserves the
"*". It would be readily implemented, but it is quite certain that
nowhere is it implemented currently. It would also be possible to
preserve the <empty> from alternative 1 as well.]

Personally, I think <wildmat>s is a step too far, and I would recommend
alternative 1. But we need to discuss this.

I have attached my complete text, as it now stands.


-- 
Charles H. Lindsey ---------At Home, doing my own thing------------------------
Tel: +44 161 436 6131 Fax: +44 161 436 6133   Web: http://www.cs.man.ac.uk/~chl
Email: chl@clerew.man.ac.uk      Snail: 5 Clerewood Ave, CHEADLE, SK8 3JU, U.K.
PGP: 2C15F1A9      Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4 AB A5
2.  The News URI Scheme

   The news URI scheme is used to refer to either news groups
   or individual Netnews articles, as defined in [RFC 1036].

   The news URI takes the form:

      newsURI     = "news:" ( article / group / all-groups )
      article     = [ news-server "/" ] message-id
      group       = [ news-server "/" ] newsgroup-name
      all-groups  = news-server [ "/" [ "*" ] ] / "*"
      news-server = "//" authority
      message-id  = printable-ascii "@" printable-ascii
      newsgroup-name  = 1*%d33-126
      printable-ascii = 1*( %d33-61 / %d63-126 ) ; excludes ">"

   <authority> is defined in [RFC 3986], and provides for a <host>, a
   <port> (defaulting to 119 in this scheme) and possibly a <userinfo>.

   Within a <printable-ascii> and a <newsgroup-name>, the characters
   '%', '@', '/', '?' and '#' are reserved and MUST be %-encoded if they
   occur. All other characters MAY be used freely to represent
   themselves. It is not precluded that future extensions to the Netnews
   standard may permit octets outside of the given ranges, in which case
   they too MUST be %-encoded (except perhaps when used in an IRI [RFC
   3987]).

   If no <news-server> is specified, the resources are to be retrieved
   from whatever server has been configured for local use.

2.1  The newsURI contains an <article>

   A <message-id> corresponds to the <msg-id> of [RFC 2822] and to the
   Message-ID of section 2.1.5 of [RFC 1036], but without the enclosing
   "<" and ">". It MUST be the message identifier of an actual Netnews
   article and hence will in practice conform to the syntax defined in
   [RFC 1036] or in any subsequent standard for Netnews articles. Thus not
   every <message-id> as defined above is valid.

   Observe the delimiter "@" which enables an <article> to be
   distinguished from a <newsgroup-name>.

   The resource retrieved by this URI is the Netnews article with the
   given <message-id>.  In a properly working Netnews system, the same
   article will be obtained whatever server is accessed for the purpose
   (assuming the server in question carried that article in the first
   place and that it has not expired).

2.2  The newsURI contains a <group>

   According to [RFC 1036], the <newsgroup-name> will in practice be a
   period-delimited hierarchical name, such as "comp.lang.perl.modules".

   The resource retrieved by this URI is some means to gain access to
   the articles in the given <newsgroup-name> that are available from the
   given <authority> (usually by invoking a suitable news reading agent
   initialized to access that group).

2.3  The newsURI contains an <all-groups>

   If the newsURI is of one of the following forms:
      <URI:news:*>
      <URI:news://news.example.com/*>
      <URI:news://news.example.com/>
      <URI:news://news.example.com>
   it refers to "all available news groups".  The resource retrieved by
   this URI is some means to gain access to all the newsgroups that are
   available from the given <authority> (usually by invoking a suitable news
   reading agent).

[Issue: Do we really want all those forms? Only the first was in [RFC
1738], but many agents currently accept the others. Moreover, some
agents are known to barf on anything with '*' in it. Maybe the '*' part
of the notation should be dispensed with. I therefore offer two
alternative formulations.]

[1st alternative]

      all-groups  = news-server [ "/" ] / <empty>

   The possibility for <all-groups> to consist of a "*", which was
   present in [RFC 1738] is now obsoleted, and its continued use is
   deprecated. It was, in any case, only patchily implemented.

[That allows the following forms:
      <URI:news:>
      <URI:news://news.example.com/>
      <URI:news://news.example.com>
of which the first may or may not already work on current
implementations (but that is true of the others also).]

[2nd alternative]

      newsURI     = "news:" ( article / group )
      article     = [ news-server "/" ] message-id
      group       = [ news-server "/" ] wildmat

[where <wildmat> is defined in draft-ietf-nntpext-base-*.txt and would
allow the following forms:
      <URI:news:*>
      <URI:news:comp.*>
      <URI:news:*.test>
      <URI:news://news.example.com/*>
      <URI:news://news.example.com/comp.*>
      <URI:news://news.example.com/*.test>

this is an enhancement of draft-gilman-news-url-02.txt and preserves the
"*". It would be readily implemented, but it is quite certain that
nowhere is it implemented currently. It would also be possible to
preserve the <empty> from alternative 1 as well.]

3.  The nntp URI scheme

   The nntp URI scheme is used to refer to individual Netnews articles,
   as defined in [RFC 1036].

   The nntp URI takes the form:

      nntpURI     = "nntp"  ":" news-server "/" newsgroup-name "/" range
      news-server =  "//" authority
      range       = article-number ["-" [article-number]]
      article-number = 1*DIGIT

   Observe, in contradistinction to the news scheme, that the
   <news-server> is not optional here, because the mapping from
   <article-numbers> to actual articles is established independently by
   each server.

3.1  The range is a single <article-number>

   The resource retrieved by this URI is the Netnews article numbered
   by the given <article-number> in the given <newsgroup-name> from the
   given <authority>.

3.2  The range encompasses more than a single <article-number>

   The resource retrieved by this URI is some means to gain access to
   the articles numbered within the given <range> of <article-
   number>s in the given <newsgroup-name> from the given <authority>
   (usually by invoking a suitable news reading agent initialized to
   access that range). A <range> of the form "nnnn-" provides access to
   all articles numbered "nnnn" and above.
Received on Friday, 18 February 2005 12:13:07 UTC

This archive was generated by hypermail 2.4.0 : Sunday, 10 October 2021 22:17:47 UTC