Re: why use IRIs?

Hi Martin, thanks for the clarification. I have a few comments inline.

On 6/25/12 3:22 AM, "Martin J. Dürst" wrote:
> Hello Peter,
> 
> I think Björn already gave very good answers to your questions.
> 
> On 2012/06/22 3:28, Peter Saint-Andre wrote:
>> <hat type='individual'/>
>>
>> I've been thinking about IRIs, and I'm wondering: why would a protocol
>> "upgrade" from URIs to IRIs?
> 
> As Björn said, it's really more about new protocols than about upgrades.
> Also, different protocols (and formats) can upgrade in different ways.
> Sometimes, this can be done formally with extensions, at other times
> it's done gradually and sooner or later gets accepted in a spec. For
> other cases, of course, it may never happen.
> 
>> (If it really is an "upgrade" -- a topic
>> for another time.)
>>
>> Consider HTTP. It has always used URIs for retrieving documents and
>> linking and such.
> 
> [There are some reports of clients just sending UTF-8, which I think
> would mean using IRIs. But that has never reached the spec.]

Do you think it should reach the spec?

>> Why would it change to use IRIs? Section 1.2 of
>> 3987bis describes some necessary conditions for such a change, but
>> doesn't really motivate why the HTTP community would want to do so. Yes,
>> there is text in Section 1.1 about representing the words of natural
>> languages, but URIs can be used to represent those words right now. I
>> grant that the current mechanism for such representation isn't pretty,
>> but do the addressing elements of a protocol like HTTP need to be
>> pretty, or can we simply depend on the presentation software (e.g., web
>> browsers) to make things look nice for the user?
> 
> I think the real motivation would be people looking at HTTP traces and
> preferring to see Unicode rather than lots of %HH strings. Of course the
> number of people looking at HTTP traces is low, and they are not end users.
> 
> In general, the motivation to use IRIs is highest closer to end users
> and content-oriented people such as document authors, and gets lower the
> lower one gets in the protocol stack.

It seems to me that end users can be shielded from what you call "this
weird %HH stuff" (after all, we don't show them "this weird
angle-bracket stuff" either), but what you say about document authors
and operations people makes sense. Perhaps it would be good to capture
that in the spec.

> Another motivation may be compression.
> http://ja.wikipedia.org/wiki/青山学院大 is quite a bit shorter than
> http://ja.wikipedia.org/wiki/%E9%9D%92%E5%B1%B1%E5%AD%A6%E9%99%A2%E5%A4%A7%E5%AD%A6.
> So maybe we can sell that to HTTP 2.0. But I'm somewhat skeptical. Only
> a tiny bit of creative thinking would have been needed to transition
> various header fields in HTTP from the hopelessly outdated iso-8859-1
> (Latin-1) to UTF-8, but it didn't happen :-(.
> 
> The best motivation would be streamlining. EAI does a lot of
> streamlining for e-mail; if it weren't for all the legacy baggage, it
> would be a joy to implement. For HTTP, if browsers use Unicode
> internally, and servers use it internally, what's the need for this
> weird %HH stuff anyway? (It's still needed to escape reserved
> characters, though.)
> 
> 
>> (Certainly we do that
>> with structural elements like the HTML document format, why not also
>> with addressing elements like URIs?) I realize that these questions get
>> back to the matter of "protocol element" vs. "presentation", but I guess
>> what I'm saying is that I don't yet think we've really explained why we
>> need to make IRIs a first-class protocol element (or why a given
>> protocol would want to make the switch from URI-only to IRI).
>>
>> Furthermore, 3987bis doesn't really explain what would be involved in
>> the change from URI-only to IRI in any given protocol. I suppose spec
>> writers in a technology community like HTTP would need to figure it out,
>> but IMHO some guidelines would be helpful.
> 
> As I said at the start of this mail, I think it depends a lot on the
> specific protocol. The conditions we give in Section 1.2 are general
> considerations that apply to any protocol/format. Protocol-specific
> considerations should do the rest, and I'm not sure it makes sense to
> write much about this.
> 
> But when looking at Section 1.2, I realized that the first sentence
> might have been the motivation for your mail. This sentence says:
>    IRIs are designed to allow protocols and software that deal with URIs
>    to be updated to handle IRIs.
> I think that this puts too much emphasis on "update", but I'm not yet
> sure how to fix that.

Well, "update" is not "upgrade", so perhaps I have read too much into
the text. However, I think we could change it to read:

   IRIs are designed to allow protocols and software that deal with URIs
   to also handle IRIs if desired.

Peter

-- 
Peter Saint-Andre
https://stpeter.im/

Received on Friday, 29 June 2012 18:28:13 UTC