- From: Roy T. Fielding <fielding@gbiv.com>
- Date: Mon, 28 Dec 2009 16:33:46 -0800
- To: Larry Masinter <masinter@adobe.com>
- Cc: "julian.reschke@gmx.de" <julian.reschke@gmx.de>, "public-iri@w3.org" <public-iri@w3.org>
On Dec 28, 2009, at 2:56 PM, Larry Masinter wrote: >> This is still confusing IRIs with the arbitrary contents of an >> href (or other) attribute. > > To try to bring two things into alignment isn't "confusing" > them. Why shouldn't the most widely deployed implementations > be used as a guideline? Because they are two different things. I don't want them aligned any more than I want "child" and "adult" aligned into "person". The distinction is necessary for some standards and some implementations. The fact that it doesn't matter for some specific browser contexts does not imply that it doesn't matter for everyone else. >> The fact is that HTML5 (and others) needs a definition of reference >> and the rules for converting a reference to an IRI or URI. > > Yes. I'm willing to admit that it may be necessary to retain some > elements as "preprocessing", although I'm not convinced. > >> Trying to pretend that a reference is always an IRI is doomed >> to fail -- you might as well obsolete the RFC and say that >> an IRI is anyString. > > I'm not trying to _pretend_ anything, I'm trying to make > it so. And issuing a new version of an RFC *does* make > the old version "obsolete". If the standard doesn't match > what implementations do, obsoleting the standard and > making a new version isn't bad. The current standard reads on the output of the algorithm and you want to redefine the same term as defining the input of the algorithm. Such a change is bizarre. Just as bizarre as the similarly nonsensical way that HTML5 redefines URL. I told you before, the solution to this problem is to not use the same term for both the input and output of the algorithm. That is the whole point of differentiating references from the final result: an interoperable identifier in absolute form. Both IRI and URI define the final result, not the data-entry input, because there was no uniformity in how one gets from an arbitrary string to a URI. We might be able to get that kind of uniformity within a single implementation space, such as HTML attributes, but it would be at best a proposal that has not yet been implemented in practice. > I'm not convinced that it is inappropriate to define > a syntax which parses into components, and yet any > string *has* a parse, and that validity is determined > after the parse rather than before. (Especially since > the restrictions on character ranges may be different > from one parsed field to another.) I already did that. See RFC 3986 appendix B http://tools.ietf.org/html/rfc3986#appendix-B >> Thus making all current references to the standard wrong >> and useless. > > If current references to the IRI Proposed Standard don't > match what implementations actually do, then perhaps they > ARE _wrong_, and fixing the specifications to match the > widely deployed and interoperable implementations is > actually the right thing to do. Browser data entry forms (search boxes) are not implementations of IRI. HTML href is not an implementation of IRI. The output of a browser's reference parser, just before it sends an address on the wire for HTTP, is an implementation of URI. >> Julian is right. > > I didn't read a specific position in Julian's post, but > rather just pointing out there were some existing > specifications that would have to be reworded if > the "no internal spaces" restriction might be required > for those applications. What Julian meant, I think, is that other protocols currently reference the term IRI expecting that the grammar disallows spaces, in the same way that the HTTP protocol assumes that a valid request target cannot contain a space. >> What you should be doing >> is defining an algorithm from anyString to the current >> definition of IRI, > > That's what > http://tools.ietf.org/html/draft-duerst-iri-bis-07#section-7.2 > section 7.2 " Web Address processing" already attempts. > Do you think it accomplishes that? No. I cannot even conceive of implementing that since the ABNF is invalid and the preprocessing steps occur after the grammar is defined. It makes no sense. Why not just take anyString, split it into separate references by whitespace if that is how the context is defined, preprocess that string to remove embedded linefeeds and transform disallowed into allowed characters, and then apply the regular expression in RFC 3968? >> and then change HTML5 so that it uses >> anyString (or whatever you want to call it) as the attribute >> definition. > > That's what was intended by: > http://lists.w3.org/Archives/Public/public-html/2009Nov/att-0670/iri-rewrite-draft.html > Do you think this is the right direction, then? I think it would be easier to simply define how to process a Web reference (not an address yet) into a Web address in the form of an IRI or URI. > Some of those definitions are useful outside of the context > of HTML; do you agree with moving some of them into the > IRI-BIS document? No. Some of those definitions aren't even useful inside HTML5 because the attribute string has to be parsed for whitespace issues based on the definition of that attribute -- there is no single attribute parser algorithm for HTML. Furthermore, what do we do then for documents that are not Unicode based, do not have references that are Unicode based, and will not work with IRI conversion to UTF-8? Should those be called IRIs as well? >> My suggested name is "Web reference". > > I used "Web address" rather than "Web reference", since > that's was the term used before. > >> Just be >> aware that some HTML5 attributes require a list of >> space-separated references, whereas others require a >> single reference that expects space to be auto-encoded >> by the parser. > > I looked through the HTML5 specification for any specific reference > to WEBADDRESS or HTML5 section 2.5, and saw no such attributes; > could you give an example of an HTML5 attribute which requires a > list of space-separated references? rel="", itemprop="", and potentially any attribute that consists of an undefined set of space-separated tokens (token syntax is only restricted to exclude space). ....Roy
Received on Tuesday, 29 December 2009 00:34:17 UTC