- From: Joseph Anthony Pasquale Holsten <joseph@josephholsten.com>
- Date: Mon, 4 Jan 2010 15:12:01 -0600
- To: uri@w3.org
"Charles Lindsey" <chl@clerew.man.ac.uk> said:
> Draft-ellermann-news-nntp-uri-11.txt is currently going through AUTH48
> and, since Frank Ellermann seems not to have been heard from for more
> than a year, and cannot be contacted, I am getting the job of seeing
> what needs to be done (most notably changes necessitated by the AUTH48
> changes in RFC 5536).
Sorry to hear Ellerman hasn't turned up. I'm glad you're pushing forward.
> I find the question of just what needs to be percent-emcoded is hard to
> deduce from RFC 3986. Clearly, anything in <gen-delims> MUST be
> percent-encoded except when used as delimiters, so that agents can
> divide a URI into scheme, authority, path, query, and fragment
> components even before they recognise that it is a news or nntp URI.
> But is it REQUIRED for the <sub-delims> if the particular scheme does
> not use any of them as delimiters? RFC 3986 seems to imply not, so I
> would expect that in
> news:foo@bar.!#$%&'*+/=?^`{|}.example
> (yes, "bar.!#$%&'*+/=?^`{|}.example" is a valid <dot-atom-text> and
> hence can occur in a Message-ID) I would have to percent-encode the
> '#'. '/' and '?', but not the others. Frank seems to have taken the
> view that all <sub-delims> need to be encoded, though he does at one
> point permit '*' to appear unencoded (and it was indeed explicitly
> allowed in RFC 1738), which appears to be inconsistent wuth his stance
> elsewhere
>
> And he also includes an example
> news://news.gmane.org/p0624081dc30b8699bf9b@%5B10.20.30.108%5D
> where I would have thought he could have shown
> news://news.gmane.org/p0624081dc30b8699bf9b@[10.20.30.108]
>
> So exactly what latitude does RFC 3986 permit in these situations?
If you do not require expressing any of reserved in your segments, you
have no need to allow percent encoding in the definitions of those
segments. Sub-delims don't need to be percent encoded unless you are
using them as delimiters.
But practically, you need to write the definitions to allow
percent-encoding in all your segments. Looking at your section 4, your
news: syntax is quite busted. At the moment, it does not allow percent
encoding for characters that don't have to be encoded. I can appreciate
not wanting to allow "." to be percent encoded in mid-left, but
mid-atext is just asking for naive implementors to build invalid news
uris. RFC3986§2.4 explicitly mentions that, 'For example, the
octet corresponding to the tilde ("~") character is often encoded as
"%7E" by older URI processing implementations; the "%7E" can be
replaced by "~" without changing its interpretation.'
IMHO, it's often not worth defining these things quite so formally at
the URI level. I'd rather you just say that an article must (not
necessarily completely) percent encoded representation of a usefor
msg-id-core than the hoops you're jumping through now. Few people
actually write their parsers to the grammars in these specs, so they'll
usually be catching this error later in processing. I figure you're
dealing with existing implementations, so it's a better use of time to
point them at usefor and mention any caveats.
If you're going to be rigorous, then you'll need to define segments
like article and newsgroups with the exact same syntax as usefor, being
liberal in which delimiters are allowed. They should also include the
entire range of allowable percent-encoded triplets. Then list all the
things that they SHOULD NOT put into URIs, like percent-encoding
something in ALPHA.
Which brings us to a higher level critique of the operational semantics
defined by this spec. Are these URIs just for identifying articles,
like a urn:uuid: or urn:sha1:? Should a user agent be able to retrieve
arbitrary articles? What happens when I try to access the mythical
<news:foo@bar.!#$%&'*+/=?^`{|}.example>? Does this refer to NNTP
messages being sent? What are the error conditions that may be caused
by the impedance between URIs and NNTP? I see some mention of failure
in the security considerations, so that's good. Thinking about how the
user agents actually handle URIs is the best guidance for writing these
specs.
As for your two examples, both open fine in my newsreader. I'd hate for
the spec to disagree without just cause.
--
Joseph Holsten
http://josephholsten.com
mailto:joseph@josephholsten.com
tel:+1-918-948-6747
Received on Monday, 4 January 2010 21:25:36 UTC