RE: URI schemes and IRI deployment

Hello Martin,

> -----Original Message-----
> From: public-iri-request@w3.org 
> [mailto:public-iri-request@w3.org] On Behalf Of Martin Duerst
> Sent: 29 June 2004 14:32
> To: Williams, Stuart
> Cc: public-iri@w3.org
> Subject: Re: URI schemes and IRI deployment
> 
> 
> Hello Stuart,
> 
> Just a quick answer, because I'm traveling.
> 
> At 11:56 04/06/29 +0100, Williams, Stuart wrote:
> 
> >Martin,
> >
> >I'd like to understand expectations wrt to IANA registered URI schemes 
> >following adoption of the IRI spec as an RFC and during deployment of
IRIs.
> >
> >Do they 'instantly' become IRI schemes too?
> 
> Yes, all of them in the sense that every URI is an IRI, at least.
> And most of them to the extent that they allow %-encoding and 
> either require (e.g. urn, imap,...) or allow (e.g. http) the 
> %-encoding to be based on UTF-8.
> 
> 
> >Will they require maintenance to allow the use of the expanded 
> >character set allowed by the generic IRI syntax?
> 
> Some of them will. The most prominent example: mailto:, which 
> is very restrictive in where it allows %-encoding, if at all.
> 
> 
> >The IRI spec gives a generic syntax that allows a broader range of 
> >characters to be used identifiers, but each currently registered scheme 
> >is written from a URI perspective with the potential to narrow rather 
> >than broaden the range characters used in an identifier from those 
> >permissablein the URI spec.
> >
> >The IRI spec. has section on upgrade strategy (7.8) which speaks of 
> >upgrading of applications to handle IRI, but it does not appear to say 
> >anything about upgrading of URI scheme registrations.
> 
> What URI schemes might need upgrade or not can be deduced 
> from the exact definition of what's an IRI, which explicitly 
> requires that the result of the IRI->URI conversion has to be 
> a legal URI.

Ok... I see... Section "3.1 Mapping IRIs toURIs"

<quote>
   This mapping has two purposes:

   a) Syntactical:  Many URI schemes and components define additional
      syntactical restrictions not captured in Section 2.2.
      Scheme-specific restrictions are applied to IRIs by converting
      IRIs to URIs and checking the URIs against the scheme-specific
      restrictions.
</quote>

So,,, an IRI is only a valid IRI *if* it 1) meets the constraints of the
generic IRI syntax, and 2) when mapped to a URI using the mapping specified
in section 3.1 it results in a URI that is valid URI that meets the
constraints of the generic URI syntax and meets any other scheme specific
syntactic constraints.

This retains a central role for URI in determining the validity of IRI - and
any operationalised use of the relevant URI scheme to access the resource
can *only* be applied once the IRI has been transformed to the corresponding
URI..

>
> >The identifier http://www.w3.org/People/d?st
> 
> [sorry, my Japanese mailer will have garbled that]
> 
> 
> >may be admissable under the
> >generic IRI syntax, but is it a valid HTTP scheme IRI? And if so... 
> >what specification makes it admissable as an HTTP scheme IRI?
> 
> The IRI spec. If you take the above, and convert it to an 
> URI, you will get http://www.w3.org/People/d%C3%BCrst. The 
> HTTP URI spec says that this is a legal URI, so the one above is a legal
IRI.
> In this case, it's not only legal, it's actually 
> dereferencable, although the content at that location isn't 
> terribly up to date, and the exact URI would be 
> http://www.w3.org/People/D%C3%BCrst,
> but the server takes care of the casing issue.
> 
> You will also observe that if you put the URI in the address 
> bar in Opera, you'll get back the IRI. Other browsers may do 
> something similar, or may at least allow you to put in the 
> IRI and get to the actual page. (you have to be careful in 
> the above example because I also put in some redirects, e.g. 
> for http://www.w3.org/People/d%FCrst, the Latin-1-encoded 
> version, but you'll see when that happens because the 
> redirects are explicitly taking time.
> 
> 
> >Simply, my question is... what is the transition plan for scheme 
> >registrations wrt to IRI deployment?
> 
> For many if not most, there is no need for a plan. 

Because any operationalised use of the IRI to access a resource requires
mapping to a valid URI and then the use of that URI for resource access - so
the sense of scheme is only really applicable to the resulting URI.

> For some, 
> such as http, it's mostly an issue of how people set up their 
> servers. For some, such as mailto:, some work may be 
> appropriate, but in that specific case, there have been quite 
> a few discussions about weather and how to internationalize 
> the left hand side of an email address, and that discussion 
> hasn't yet been conclusive.
> It seemed better to wait to see where that would lead before 
> upgrading mailto:.
> For newly created URIs, if they follow the guidelines for new 
> URI schemes, they'll work with IRIs automatically.
> 
> 
> >Apologies if you have answered this before... I have looked, but did 
> >not find anything relevant.
> 
> In the IRI spec, please look at 'applicability' very early 
> on, and then at the prose in the sections on syntax and on 
> IRI->URI mapping. There is no such thing as e.g. a 'catalog 
> of schemes that need upgrading', because after all, the IRI 
> draft is a generic document.

On the basis that the (existing) schemes themselves are only really
applicable to URI arising from the IRI->URI mapping then I think I agree
that the adoption of the IRI spec induces no maintenance burden on those
existing schemes.

Also, the advice in RFC2718 is focussed on new *URI* schemes (well "URL
schemes" by virtue of it’s title)  presumably to make it more likely that
the IRI->URI mapping generates a valid URI (hence validating the IRI), So,
as Ted Hardie points out there is no notion of an IRI scheme (my fault for
not understanding) there are only URI schemes.

FWIW: I think that it would be worth explaining the lack of need to
maintain/transition URI schemes for use with IRIs in the section on upgrade
strategy - I think that the argument is fairly subtle and not called out
elsewhere in the draft.

BTW I suspect that there will be quite a few invalid IRI in common usage,
because people may omit the validtity check of mappint to a URI and checking
that any scheme induced constraints have also been met.
 
> 
> 
> Regards,    Martin.
> 

Thanks

Stuart.

Received on Wednesday, 30 June 2004 11:42:53 UTC