Re: why use IRIs?

Hello Peter,

I think Björn already gave very good answers to your questions.

On 2012/06/22 3:28, Peter Saint-Andre wrote:
> <hat type='individual'/>
>
> I've been thinking about IRIs, and I'm wondering: why would a protocol
> "upgrade" from URIs to IRIs?

As Björn said, it's really more about new protocols than about upgrades. 
Also, different protocols (and formats) can upgrade in different ways. 
Sometimes, this can be done formally with extensions, at other times 
it's done gradually and sooner or later gets accepted in a spec. For 
other cases, of course, it may never happen.

> (If it really is an "upgrade" -- a topic
> for another time.)
>
> Consider HTTP. It has always used URIs for retrieving documents and
> linking and such.

[There are some reports of clients just sending UTF-8, which I think 
would mean using IRIs. But that has never reached the spec.]


> Why would it change to use IRIs? Section 1.2 of
> 3987bis describes some necessary conditions for such a change, but
> doesn't really motivate why the HTTP community would want to do so. Yes,
> there is text in Section 1.1 about representing the words of natural
> languages, but URIs can be used to represent those words right now. I
> grant that the current mechanism for such representation isn't pretty,
> but do the addressing elements of a protocol like HTTP need to be
> pretty, or can we simply depend on the presentation software (e.g., web
> browsers) to make things look nice for the user?

I think the real motivation would be people looking at HTTP traces and 
preferring to see Unicode rather than lots of %HH strings. Of course the 
number of people looking at HTTP traces is low, and they are not end users.

In general, the motivation to use IRIs is highest closer to end users 
and content-oriented people such as document authors, and gets lower the 
lower one gets in the protocol stack.

Another motivation may be compression.
http://ja.wikipedia.org/wiki/青山学院大 is quite a bit shorter than
http://ja.wikipedia.org/wiki/%E9%9D%92%E5%B1%B1%E5%AD%A6%E9%99%A2%E5%A4%A7%E5%AD%A6. 
So maybe we can sell that to HTTP 2.0. But I'm somewhat skeptical. Only 
a tiny bit of creative thinking would have been needed to transition 
various header fields in HTTP from the hopelessly outdated iso-8859-1 
(Latin-1) to UTF-8, but it didn't happen :-(.

The best motivation would be streamlining. EAI does a lot of 
streamlining for e-mail; if it weren't for all the legacy baggage, it 
would be a joy to implement. For HTTP, if browsers use Unicode 
internally, and servers use it internally, what's the need for this 
weird %HH stuff anyway? (It's still needed to escape reserved 
characters, though.)


> (Certainly we do that
> with structural elements like the HTML document format, why not also
> with addressing elements like URIs?) I realize that these questions get
> back to the matter of "protocol element" vs. "presentation", but I guess
> what I'm saying is that I don't yet think we've really explained why we
> need to make IRIs a first-class protocol element (or why a given
> protocol would want to make the switch from URI-only to IRI).
>
> Furthermore, 3987bis doesn't really explain what would be involved in
> the change from URI-only to IRI in any given protocol. I suppose spec
> writers in a technology community like HTTP would need to figure it out,
> but IMHO some guidelines would be helpful.

As I said at the start of this mail, I think it depends a lot on the 
specific protocol. The conditions we give in Section 1.2 are general 
considerations that apply to any protocol/format. Protocol-specific 
considerations should do the rest, and I'm not sure it makes sense to 
write much about this.

But when looking at Section 1.2, I realized that the first sentence 
might have been the motivation for your mail. This sentence says:
    IRIs are designed to allow protocols and software that deal with URIs
    to be updated to handle IRIs.
I think that this puts too much emphasis on "update", but I'm not yet 
sure how to fix that.

Regards,   Martin.

Received on Monday, 25 June 2012 09:23:12 UTC