W3C home > Mailing lists > Public > public-iri@w3.org > July 2004

RE: URI schemes and IRI deployment (issue schemes-iri-38)

From: Martin Duerst <duerst@w3.org>
Date: Wed, 07 Jul 2004 17:06:22 +0900
Message-Id: <4.2.0.58.J.20040707163822.03aa63c0@localhost>
To: "Williams, Stuart" <skw@hp.com>
Cc: public-iri@w3.org

Hello Stuart,

I have listed this as issue schemes-iri-38 at
http://www.w3.org/International/iri-edit/#schemes-iri-38.

I won't do any actual edits before getting an idea from Ted
about procedural issues, now that the document is officially
in the IESG's hand. But I'll mention it when I think some
changes might be helpful, and when I think they are unnecessary.

At 16:42 04/06/30 +0100, Williams, Stuart wrote:

>Hello Martin,
>
> > -----Original Message-----
> > From: public-iri-request@w3.org
> > [mailto:public-iri-request@w3.org] On Behalf Of Martin Duerst
> > Sent: 29 June 2004 14:32
> > To: Williams, Stuart
> > Cc: public-iri@w3.org
> > Subject: Re: URI schemes and IRI deployment
> >
> >
> > Hello Stuart,
> >
> > Just a quick answer, because I'm traveling.
> >
> > At 11:56 04/06/29 +0100, Williams, Stuart wrote:
> >
> > >Martin,
> > >
> > >I'd like to understand expectations wrt to IANA registered URI schemes
> > >following adoption of the IRI spec as an RFC and during deployment of
>IRIs.
> > >
> > >Do they 'instantly' become IRI schemes too?
> >
> > Yes, all of them in the sense that every URI is an IRI, at least.
> > And most of them to the extent that they allow %-encoding and
> > either require (e.g. urn, imap,...) or allow (e.g. http) the
> > %-encoding to be based on UTF-8.
> >
> >
> > >Will they require maintenance to allow the use of the expanded
> > >character set allowed by the generic IRI syntax?
> >
> > Some of them will. The most prominent example: mailto:, which
> > is very restrictive in where it allows %-encoding, if at all.
> >
> >
> > >The IRI spec gives a generic syntax that allows a broader range of
> > >characters to be used identifiers, but each currently registered scheme
> > >is written from a URI perspective with the potential to narrow rather
> > >than broaden the range characters used in an identifier from those
> > >permissablein the URI spec.
> > >
> > >The IRI spec. has section on upgrade strategy (7.8) which speaks of
> > >upgrading of applications to handle IRI, but it does not appear to say
> > >anything about upgrading of URI scheme registrations.
> >
> > What URI schemes might need upgrade or not can be deduced
> > from the exact definition of what's an IRI, which explicitly
> > requires that the result of the IRI->URI conversion has to be
> > a legal URI.
>
>Ok... I see... Section "3.1 Mapping IRIs toURIs"
>
><quote>
>    This mapping has two purposes:
>
>    a) Syntactical:  Many URI schemes and components define additional
>       syntactical restrictions not captured in Section 2.2.
>       Scheme-specific restrictions are applied to IRIs by converting
>       IRIs to URIs and checking the URIs against the scheme-specific
>       restrictions.
></quote>
>
>So,,, an IRI is only a valid IRI *if* it 1) meets the constraints of the
>generic IRI syntax, and 2) when mapped to a URI using the mapping specified
>in section 3.1 it results in a URI that is valid URI that meets the
>constraints of the generic URI syntax and meets any other scheme specific
>syntactic constraints.
>
>This retains a central role for URI in determining the validity of IRI -

Yes. But the intent of this is only to restrict IRI syntax to
scheme-specific URI syntax as far as URI syntax is restricted by
scheme-specific syntax. There are many instances of URI processing
where there is no check whatsoever to make sure that the URI syntax
conforms to scheme-specific restrictions. In such cases, I don't
expect the equivalent IRI implementations to check against scheme-
specific syntax. If the current spec gives a different impression,
then that would have to be fixed. But I hope there is no need;
'scheme-specific restrictions are applied' implies that this is
done if and when somebody wants to apply scheme-specific restrictions.
[and of course implementations can also do the equivalent of
translating scheme-specific URI syntax into scheme-specific IRI syntax
if they want to and that doesn't change external behavior].


>and
>any operationalised use of the relevant URI scheme to access the resource
>can *only* be applied once the IRI has been transformed to the corresponding
>URI..

No, that's not true. It's true for most if not all protocols at the
moment, but it's not true for future protocols. Although this of course
may never happen, it is very easy to immagine something like HTTP/2.0
or HTTP/1.2, where URIs can use UTF-8. There would be absolutely
no need for a transformation to URIs for the protocol, and there
would be only a small amount of work, if at all, to upgrade
some servers (e.g. Apache; for others, that may or may not apply).
Servers would then have to take care of the requirement that
%-encoding is only syntactic sugar, but they deal with that
anyway today already for the US-ASCII range.


> >
> > >The identifier http://www.w3.org/People/d?st
> >
> > [sorry, my Japanese mailer will have garbled that]
> >
> >
> > >may be admissable under the
> > >generic IRI syntax, but is it a valid HTTP scheme IRI? And if so...
> > >what specification makes it admissable as an HTTP scheme IRI?
> >
> > The IRI spec. If you take the above, and convert it to an
> > URI, you will get http://www.w3.org/People/d%C3%BCrst. The
> > HTTP URI spec says that this is a legal URI, so the one above is a legal
>IRI.
> > In this case, it's not only legal, it's actually
> > dereferencable, although the content at that location isn't
> > terribly up to date, and the exact URI would be
> > http://www.w3.org/People/D%C3%BCrst,
> > but the server takes care of the casing issue.
> >
> > You will also observe that if you put the URI in the address
> > bar in Opera, you'll get back the IRI. Other browsers may do
> > something similar, or may at least allow you to put in the
> > IRI and get to the actual page. (you have to be careful in
> > the above example because I also put in some redirects, e.g.
> > for http://www.w3.org/People/d%FCrst,

Sorry, that should have been http://www.w3.org/People/D%FCrst;
redirects and case-insensitivity don't combine in our setup.

>the Latin-1-encoded
> > version, but you'll see when that happens because the
> > redirects are explicitly taking time.
> >
> >
> > >Simply, my question is... what is the transition plan for scheme
> > >registrations wrt to IRI deployment?
> >
> > For many if not most, there is no need for a plan.
>
>Because any operationalised use of the IRI to access a resource requires
>mapping to a valid URI and then the use of that URI for resource access - so
>the sense of scheme is only really applicable to the resulting URI.

There is no actual mapping necessary, as I explained above,
but you are right in the sense that there are no independent
IRI schemes. Of course, as Ted has pointed out, that doesn't
mean that suddenly, IRIs will be allowed everywhere by some
magic.


> > For some,
> > such as http, it's mostly an issue of how people set up their
> > servers. For some, such as mailto:, some work may be
> > appropriate, but in that specific case, there have been quite
> > a few discussions about weather and how to internationalize
> > the left hand side of an email address, and that discussion
> > hasn't yet been conclusive.
> > It seemed better to wait to see where that would lead before
> > upgrading mailto:.
> > For newly created URIs, if they follow the guidelines for new
> > URI schemes, they'll work with IRIs automatically.
> >
> >
> > >Apologies if you have answered this before... I have looked, but did
> > >not find anything relevant.
> >
> > In the IRI spec, please look at 'applicability' very early
> > on, and then at the prose in the sections on syntax and on
> > IRI->URI mapping. There is no such thing as e.g. a 'catalog
> > of schemes that need upgrading', because after all, the IRI
> > draft is a generic document.
>
>On the basis that the (existing) schemes themselves are only really
>applicable to URI arising from the IRI->URI mapping then I think I agree
>that the adoption of the IRI spec induces no maintenance burden on those
>existing schemes.
>
>Also, the advice in RFC2718 is focussed on new *URI* schemes (well "URL
>schemes" by virtue of it’s title)  presumably to make it more likely that
>the IRI->URI mapping generates a valid URI (hence validating the IRI), So,
>as Ted Hardie points out there is no notion of an IRI scheme (my fault for
>not understanding) there are only URI schemes.
>
>FWIW: I think that it would be worth explaining the lack of need to
>maintain/transition URI schemes for use with IRIs in the section on upgrade
>strategy - I think that the argument is fairly subtle and not called out
>elsewhere in the draft.

I agree that I could add something there.


>BTW I suspect that there will be quite a few invalid IRI in common usage,
>because people may omit the validtity check of mappint to a URI and checking
>that any scheme induced constraints have also been met.

As I explained above, this is similar to what happens now with URIs.
There are quite a few invalid URIs out there, aren't there? And in
some cases, they even work (in the sense of dereference), despite
of what the spec says.
Also, no XSLT implementation I know for example checks namespace URIs
for validity against scheme definitions. This may be the same for
RDF implementions, I would expect.

Regards,    Martin.
Received on Wednesday, 7 July 2004 04:13:23 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 30 April 2012 19:51:53 GMT