- From: Roy T. Fielding <fielding@gbiv.com>
- Date: Wed, 5 May 2010 17:09:50 -0700
- To: Adam Barth <ietf@adambarth.com>
- Cc: "Phillips, Addison" <addison@lab126.com>, "public-iri@w3.org" <public-iri@w3.org>
On May 5, 2010, at 11:11 AM, Adam Barth wrote: > RFC 3986 Section 3.1 is helpful w.r.t. the casing of the scheme. > However, it's not as clear as it could be. For example, it says: > > "documents that specify schemes must do so with lowercase letters" > > It's unclear whether that's a requirement for folks who produce > documents or for folks who consume documents. That is a requirement for IETF specifications of URI schemes. It has nothing to do with processing. > Later it says: > > "An implementation should accept uppercase letters as equivalent to > lowercase in scheme names" > > Leading me to believe the first requirement is for folks who produce > documents, assuming "implementation" above refers to document > consumers. RFC 3986 defines how to parse URIs (for recipients) and provides many rules for scheme-specific specs to define how to generate URIs of a given scheme (for producers) within the overall constraint of matching the URI syntax (the formal ABNF). A URI is the most constrained form of address for maximum interoperability across both machine and non-machine transports. It is like the postal addressing standard -- there exists one form that is known to be the most readable and efficient postal handling format of an address. That does not prevent readers of an envelope from handling an unbounded number of additional addressing forms, with partial automation, and then relying on the postal carriers to interpret the nonstandard bits. > As I read the charter, we're not supposed to address issues in RFC > 3986, which might place this document out of scope depending on the > division of responsibilities between RFC 3986 and RFC 3987. Please understand that browsers almost never parse URI or IRI or anything in between. Browsers have input strings that contain one or more references, usually in the document encoding, and so there is a sequence of context-specific and charset-specific and media-type-specific processing that occurs before you even get to the individual URI-reference or IRI-reference that are defined by 3986/3987. Some people have proposed that most of that pre-processing be added to the IRIbis spec, but I have seen no evidence to suggest that such pre-processing is even remotely standardizable (it seems to be different for every input context). If you can demonstrate or get agreement on a single way to preprocess an input string, or at least a few named processes (like single-ref and multi-ref), then that would be useful. It would have no effect on RFC 3986. The only things that would impact 3986 is if the allowed characters or major components changed in the wire syntax of the URI standard, which is simply not going to happen because that would break a majority of implementations (of which browsers make up less than 1%). As far as 3986 is concerned, your algorithm is in Appendix B. Note that the algorithm will work with any superset of ASCII. IRI (3987) is more flexible because there are no wire implementations that depend on its constraints -- it could just as easily have been defined as an "any string" conversion/presentation process, which would have satisfied the scope you are looking for if there is sufficient agreement among implementations. ....Roy
Received on Thursday, 6 May 2010 00:10:20 UTC