Re: Scope question from Roy T. Fielding on 2010-05-06 (public-iri@w3.org from May 2010)

From: Roy T. Fielding <fielding@gbiv.com>
Date: Wed, 5 May 2010 17:09:50 -0700
To: Adam Barth <ietf@adambarth.com>
Cc: "Phillips, Addison" <addison@lab126.com>, "public-iri@w3.org" <public-iri@w3.org>
Message-Id: <39519A74-D48F-4A59-A206-64AB60A03C31@gbiv.com>

On May 5, 2010, at 11:11 AM, Adam Barth wrote:

> RFC 3986 Section 3.1 is helpful w.r.t. the casing of the scheme.
> However, it's not as clear as it could be.  For example, it says:
> 
> "documents that specify schemes must do so with lowercase letters"
> 
> It's unclear whether that's a requirement for folks who produce
> documents or for folks who consume documents.

That is a requirement for IETF specifications of URI schemes.  It has
nothing to do with processing.

>  Later it says:
> 
> "An implementation should accept uppercase letters as equivalent to
> lowercase in scheme names"
> 
> Leading me to believe the first requirement is for folks who produce
> documents, assuming "implementation" above refers to document
> consumers.

RFC 3986 defines how to parse URIs (for recipients) and provides
many rules for scheme-specific specs to define how to generate URIs
of a given scheme (for producers) within the overall constraint of
matching the URI syntax (the formal ABNF).

A URI is the most constrained form of address for maximum
interoperability across both machine and non-machine transports.
It is like the postal addressing standard -- there exists one
form that is known to be the most readable and efficient postal
handling format of an address.  That does not prevent readers
of an envelope from handling an unbounded number of additional
addressing forms, with partial automation, and then relying
on the postal carriers to interpret the nonstandard bits.

> As I read the charter, we're not supposed to address issues in RFC
> 3986, which might place this document out of scope depending on the
> division of responsibilities between RFC 3986 and RFC 3987.

Please understand that browsers almost never parse URI or IRI or
anything in between.  Browsers have input strings that contain one
or more references, usually in the document encoding, and so there
is a sequence of context-specific and charset-specific and
media-type-specific processing that occurs before you even get to
the individual URI-reference or IRI-reference that are defined by
3986/3987.

Some people have proposed that most of that pre-processing be added
to the IRIbis spec, but I have seen no evidence to suggest that
such pre-processing is even remotely standardizable (it seems to
be different for every input context).  If you can demonstrate or
get agreement on a single way to preprocess an input string, or at
least a few named processes (like single-ref and multi-ref), then
that would be useful.

It would have no effect on RFC 3986.  The only things that would
impact 3986 is if the allowed characters or major components
changed in the wire syntax of the URI standard, which is simply
not going to happen because that would break a majority of
implementations (of which browsers make up less than 1%).
As far as 3986 is concerned, your algorithm is in Appendix B.
Note that the algorithm will work with any superset of ASCII.

IRI (3987) is more flexible because there are no wire implementations
that depend on its constraints -- it could just as easily have
been defined as an "any string" conversion/presentation process,
which would have satisfied the scope you are looking for if there
is sufficient agreement among implementations.

....Roy

Received on Thursday, 6 May 2010 00:10:20 UTC