- From: Martin J. Dürst <duerst@it.aoyama.ac.jp>
- Date: Mon, 25 Jun 2012 18:22:23 +0900
- To: Peter Saint-Andre <stpeter@stpeter.im>
- CC: "public-iri@w3.org" <public-iri@w3.org>
Hello Peter, I think Björn already gave very good answers to your questions. On 2012/06/22 3:28, Peter Saint-Andre wrote: > <hat type='individual'/> > > I've been thinking about IRIs, and I'm wondering: why would a protocol > "upgrade" from URIs to IRIs? As Björn said, it's really more about new protocols than about upgrades. Also, different protocols (and formats) can upgrade in different ways. Sometimes, this can be done formally with extensions, at other times it's done gradually and sooner or later gets accepted in a spec. For other cases, of course, it may never happen. > (If it really is an "upgrade" -- a topic > for another time.) > > Consider HTTP. It has always used URIs for retrieving documents and > linking and such. [There are some reports of clients just sending UTF-8, which I think would mean using IRIs. But that has never reached the spec.] > Why would it change to use IRIs? Section 1.2 of > 3987bis describes some necessary conditions for such a change, but > doesn't really motivate why the HTTP community would want to do so. Yes, > there is text in Section 1.1 about representing the words of natural > languages, but URIs can be used to represent those words right now. I > grant that the current mechanism for such representation isn't pretty, > but do the addressing elements of a protocol like HTTP need to be > pretty, or can we simply depend on the presentation software (e.g., web > browsers) to make things look nice for the user? I think the real motivation would be people looking at HTTP traces and preferring to see Unicode rather than lots of %HH strings. Of course the number of people looking at HTTP traces is low, and they are not end users. In general, the motivation to use IRIs is highest closer to end users and content-oriented people such as document authors, and gets lower the lower one gets in the protocol stack. Another motivation may be compression. http://ja.wikipedia.org/wiki/青山学院大 is quite a bit shorter than http://ja.wikipedia.org/wiki/%E9%9D%92%E5%B1%B1%E5%AD%A6%E9%99%A2%E5%A4%A7%E5%AD%A6. So maybe we can sell that to HTTP 2.0. But I'm somewhat skeptical. Only a tiny bit of creative thinking would have been needed to transition various header fields in HTTP from the hopelessly outdated iso-8859-1 (Latin-1) to UTF-8, but it didn't happen :-(. The best motivation would be streamlining. EAI does a lot of streamlining for e-mail; if it weren't for all the legacy baggage, it would be a joy to implement. For HTTP, if browsers use Unicode internally, and servers use it internally, what's the need for this weird %HH stuff anyway? (It's still needed to escape reserved characters, though.) > (Certainly we do that > with structural elements like the HTML document format, why not also > with addressing elements like URIs?) I realize that these questions get > back to the matter of "protocol element" vs. "presentation", but I guess > what I'm saying is that I don't yet think we've really explained why we > need to make IRIs a first-class protocol element (or why a given > protocol would want to make the switch from URI-only to IRI). > > Furthermore, 3987bis doesn't really explain what would be involved in > the change from URI-only to IRI in any given protocol. I suppose spec > writers in a technology community like HTTP would need to figure it out, > but IMHO some guidelines would be helpful. As I said at the start of this mail, I think it depends a lot on the specific protocol. The conditions we give in Section 1.2 are general considerations that apply to any protocol/format. Protocol-specific considerations should do the rest, and I'm not sure it makes sense to write much about this. But when looking at Section 1.2, I realized that the first sentence might have been the motivation for your mail. This sentence says: IRIs are designed to allow protocols and software that deal with URIs to be updated to handle IRIs. I think that this puts too much emphasis on "update", but I'm not yet sure how to fix that. Regards, Martin.
Received on Monday, 25 June 2012 09:23:12 UTC