- From: Adam Barth <ietf@adambarth.com>
- Date: Wed, 5 May 2010 17:31:46 -0700
- To: "Roy T. Fielding" <fielding@gbiv.com>
- Cc: "Phillips, Addison" <addison@lab126.com>, "public-iri@w3.org" <public-iri@w3.org>
On Wed, May 5, 2010 at 5:09 PM, Roy T. Fielding <fielding@gbiv.com> wrote: > On May 5, 2010, at 11:11 AM, Adam Barth wrote: >> RFC 3986 Section 3.1 is helpful w.r.t. the casing of the scheme. >> However, it's not as clear as it could be. For example, it says: >> >> "documents that specify schemes must do so with lowercase letters" >> >> It's unclear whether that's a requirement for folks who produce >> documents or for folks who consume documents. > > That is a requirement for IETF specifications of URI schemes. It has > nothing to do with processing. Ah, I see. That reading makes more sense. >> Later it says: >> >> "An implementation should accept uppercase letters as equivalent to >> lowercase in scheme names" >> >> Leading me to believe the first requirement is for folks who produce >> documents, assuming "implementation" above refers to document >> consumers. > > RFC 3986 defines how to parse URIs (for recipients) and provides > many rules for scheme-specific specs to define how to generate URIs > of a given scheme (for producers) within the overall constraint of > matching the URI syntax (the formal ABNF). > > A URI is the most constrained form of address for maximum > interoperability across both machine and non-machine transports. > It is like the postal addressing standard -- there exists one > form that is known to be the most readable and efficient postal > handling format of an address. That does not prevent readers > of an envelope from handling an unbounded number of additional > addressing forms, with partial automation, and then relying > on the postal carriers to interpret the nonstandard bits. > >> As I read the charter, we're not supposed to address issues in RFC >> 3986, which might place this document out of scope depending on the >> division of responsibilities between RFC 3986 and RFC 3987. > > Please understand that browsers almost never parse URI or IRI or > anything in between. Browsers have input strings that contain one > or more references, usually in the document encoding, and so there > is a sequence of context-specific and charset-specific and > media-type-specific processing that occurs before you even get to > the individual URI-reference or IRI-reference that are defined by > 3986/3987. Where are those rules defined (e.g., for HTML documents)? I suspect that's the layer that interests me at the moment. > Some people have proposed that most of that pre-processing be added > to the IRIbis spec, but I have seen no evidence to suggest that > such pre-processing is even remotely standardizable (it seems to > be different for every input context). If you can demonstrate or > get agreement on a single way to preprocess an input string, or at > least a few named processes (like single-ref and multi-ref), then > that would be useful. It seems likely that this would be possible and valuable for at least some widely used contexts (e.g., UTF8-encoded HTML documents). > It would have no effect on RFC 3986. The only things that would > impact 3986 is if the allowed characters or major components > changed in the wire syntax of the URI standard, which is simply > not going to happen because that would break a majority of > implementations (of which browsers make up less than 1%). > As far as 3986 is concerned, your algorithm is in Appendix B. > Note that the algorithm will work with any superset of ASCII. I don't have an algorithm yet, but, according to my understanding of your email, the algorithm in Appendix B appears to a constraint on the *output* of the media/context-specific transformation that interests me. > IRI (3987) is more flexible because there are no wire implementations > that depend on its constraints -- it could just as easily have > been defined as an "any string" conversion/presentation process, > which would have satisfied the scope you are looking for if there > is sufficient agreement among implementations. I didn't understand this paragraph, but I'm not sure it's essential to our discussion. Thanks, Adam
Received on Thursday, 6 May 2010 00:32:58 UTC