W3C home > Mailing lists > Public > public-iri@w3.org > August 2004

RE: URI schemes and IRI deployment (issue schemes-iri-38)

From: Martin Duerst <duerst@w3.org>
Date: Wed, 18 Aug 2004 16:02:55 +0900
Message-Id: <4.2.0.58.J.20040818153338.03a68a58@localhost>
To: "Williams, Stuart" <skw@hp.com>
Cc: public-iri@w3.org, Ted Hardie <hardie@qualcomm.com>

Hello Stuart,

At 13:48 04/08/10 +0100, Williams, Stuart wrote:

>Hello Martin,
>
>Apologies for the delay in responding and thank you for your patience, I
>have been away on vacation.

No problem. I knew.

[slightly reordered]
> > I have tentatively closed this issue. I will submit this as
> > draft -09 before the deadline today.

Based on this mail, I have moved this issue back to pending.


>I guess I am way past your deadline, so apologies.

The deadline was the submission draft cutoff before the IETF
meeting. That's over now.



> > This means that for most URI schemes, there is no need to
> > upgrade their scheme definition in order for them to work
> > with IRIs. The main case where upgrading a scheme definition
> > may make sense is when a scheme definition is limited to the
> > use of US-ASCII characters with no provision to include
> > non-ASCII characters/octets but a desire to include such
> > characters, or only with provisions that are highly
> > scheme-specific. An example of such a scheme might be the
> > mailto: scheme [RFC2368].
>
>But is it not the case that since all URI schemes have a narrowing effect on
>generic *URI* syntax thus it is a the case that all URI schemes are "limited
>to the the use of [a subset of] US-ASCII characters/octets", the use of
>%encoding allowing the 'indirect' use of non-US-ASCII. in what then become
>visually unpleasant URI..
>
>ie. stated this way round... the "main case" is in fact the case for all URI
>schemes.

Okay, it looks like I wasn't precise enough. Let me try a proposal
for rewording, for the middle sentence in the paragraph above:

"The main case where upgrading a scheme definition
makes sense is when a scheme definition is strictly limited
to the use of US-ASCII characters with no provision to include
non-ASCII characters/octets via percent-encoding, or if a scheme
definition currently uses highly scheme-specific provisions for
the encoding of non-ASCII characters."

Would that be better? Would the changes below still be necessary?
I wouldn't want to replace the above with your text below, because
your text below says nothing about schemes that may or may not have
to be upgraded.


>I would prefer to see a sentence or two that gave a more direct account of
>the applicability of URI schemes to IRI and have drafted the following
>replacement:
>
>"In general URI schemes can impose narrowing restrictions on the syntax of
>scheme-specific URI, ie. URI that are admissable under the generic URI
>syntax [RFC2396bis]  may not be admissable due to narrower syntactic
>constraints imposed by a URI scheme specification. In general URI scheme
>definitions cannot broaden the syntactic restrictions imposed by the generic
>URI syntax, otherwise it would be possible to generate URI that satisfied
>the scheme specific syntactic constraints without satisfying the syntactic
>constraints of the generic URI syntax. However, additional syntactic
>constraints imposed by URI scheme specifications are *indirectly* applicable
>to IRI since the corresponding URI resulting from the mapping defined in
>Section 3.1 MUST be valid a URI under the syntactic restrictions of generic
>URI syntax and any narrower restrictions imposed by the corresponding URI
>scheme specification."

There is a bit too much 'in general' for my taste, and some other
minor wording issues (plural of URI is URIs). So what about:

"URI schemes can impose restrictions on the syntax of
scheme-specific URIs, ie. URIs that are admissable under the generic URI
syntax [RFCYYYY] may not be admissable due to narrower syntactic
constraints imposed by a URI scheme specification. URI scheme
definitions cannot broaden the syntactic restrictions of the generic
URI syntax, otherwise it would be possible to generate URIs that satisfied
the scheme specific syntactic constraints without satisfying the syntactic
constraints of the generic URI syntax. However, additional syntactic
constraints imposed by URI scheme specifications are *indirectly* applicable
to IRI since the corresponding URI resulting from the mapping defined in
Section 3.1 MUST be a valid URI under the syntactic restrictions of generic
URI syntax and any narrower restrictions imposed by the corresponding URI
scheme specification."


> > This specification does not upgrade any scheme specifications
> > in any way, this has to be done separately. Also, it should
> > be noted that there is no such thing as an "IRI scheme"; all
> > IRIs use URI schemes, and all URI schemes can be used with
> > IRIs, even though in some cases only by using URIs directly
> > as IRIs, without any conversion.
>
>"The publication of this specification has no effect on existing (or future)
>URI scheme specifications."

Are you proposing to replace the above paragraph with your sentence?
Or adding your sentence? In any case, I would not like the '(or future)'
part. While future scheme definitions won't suddenly add actual non-ASCII
characters to their syntax, they may very much be designed to accomodate
these characters through UTF-8 and percent-encoding, and so saying that
this spec doesn't affect future specs would be wrong.


>I think that is is important to  note (as in the above) that it is not
>possible to write URI scheme specifications that are less restrictive than
>generic URI syntax.

That's a good point. But wouldn't that belong in the URI spec, rather
than the IRI spec? Actually, in my understanding, it's already in the URI
spec, and the IRI spec does nothing to change that (nor could it).


Regards,     Martin.


>If you're willing/able to
>make changes along the lines I suggest above I think it would more clearly
>state the applicability of URI schemes in an IRI context
>
>
>Many thanks,
>
>Stuart
>--
>
>
> > -----Original Message-----
> > From: Martin Duerst [mailto:duerst@w3.org]
> > Sent: 19 July 2004 11:00
> > To: Williams, Stuart
> > Cc: public-iri@w3.org
> > Subject: RE: URI schemes and IRI deployment (issue schemes-iri-38)
> >
> > Hello Stuart, others,
> >
> > I have addressed this issue as follows:
> >
> > Because Stuart was expecting to find information about this
> > in section 7.8, "Upgrading Strategy", I have added a pointer there.
> >
> > However, I have decided that this best goes into section 6.4,
> > "Use of UTF-8 for Encoding Original Characters", because it
> > is intimately related to that discussion.
> >
> > I have added the following two paragraphs:
> >
> > This means that for most URI schemes, there is no need to
> > upgrade their scheme definition in order for them to work
> > with IRIs. The main case where upgrading a scheme definition
> > may make sense is when a scheme definition is limited to the
> > use of US-ASCII characters with no provision to include
> > non-ASCII characters/octets but a desire to include such
> > characters, or only with provisions that are highly
> > scheme-specific. An example of such a scheme might be the
> > mailto: scheme [RFC2368].
> >
> > This specification does not upgrade any scheme specifications
> > in any way, this has to be done separately. Also, it should
> > be noted that there is no such thing as an "IRI scheme"; all
> > IRIs use URI schemes, and all URI schemes can be used with
> > IRIs, even though in some cases only by using URIs directly
> > as IRIs, without any conversion.
> >
> > I have tentatively closed this issue. I will submit this as
> > draft -09 before the deadline today.
> >
> > Regards,    Martin.
> >
> >
> > At 17:06 04/07/07 +0900, Martin Duerst wrote:
> >
> > >Hello Stuart,
> > >
> > >I have listed this as issue schemes-iri-38 at
> > >http://www.w3.org/International/iri-edit/#schemes-iri-38.
> > >
> > >I won't do any actual edits before getting an idea from Ted about
> > >procedural issues, now that the document is officially in the IESG's
> > >hand. But I'll mention it when I think some changes might be
> > helpful,
> > >and when I think they are unnecessary.
> > >
> > >At 16:42 04/06/30 +0100, Williams, Stuart wrote:
> > >
> > >>Hello Martin,
> > >>
> > >> > -----Original Message-----
> > >> > From: public-iri-request@w3.org
> > >> > [mailto:public-iri-request@w3.org] On Behalf Of Martin Duerst
> > >> > Sent: 29 June 2004 14:32
> > >> > To: Williams, Stuart
> > >> > Cc: public-iri@w3.org
> > >> > Subject: Re: URI schemes and IRI deployment
> > >> >
> > >> >
> > >> > Hello Stuart,
> > >> >
> > >> > Just a quick answer, because I'm traveling.
> > >> >
> > >> > At 11:56 04/06/29 +0100, Williams, Stuart wrote:
> > >> >
> > >> > >Martin,
> > >> > >
> > >> > >I'd like to understand expectations wrt to IANA registered URI
> > >> > >schemes following adoption of the IRI spec as an RFC and during
> > >> > >deployment of
> > >>IRIs.
> > >> > >
> > >> > >Do they 'instantly' become IRI schemes too?
> > >> >
> > >> > Yes, all of them in the sense that every URI is an IRI, at least.
> > >> > And most of them to the extent that they allow %-encoding and
> > >> > either require (e.g. urn, imap,...) or allow (e.g. http) the
> > >> > %-encoding to be based on UTF-8.
> > >> >
> > >> >
> > >> > >Will they require maintenance to allow the use of the expanded
> > >> > >character set allowed by the generic IRI syntax?
> > >> >
> > >> > Some of them will. The most prominent example: mailto:, which is
> > >> > very restrictive in where it allows %-encoding, if at all.
> > >> >
> > >> >
> > >> > >The IRI spec gives a generic syntax that allows a
> > broader range of
> > >> > >characters to be used identifiers, but each currently
> > registered
> > >> > >scheme is written from a URI perspective with the potential to
> > >> > >narrow rather than broaden the range characters used in an
> > >> > >identifier from those permissablein the URI spec.
> > >> > >
> > >> > >The IRI spec. has section on upgrade strategy (7.8)
> > which speaks
> > >> > >of upgrading of applications to handle IRI, but it does
> > not appear
> > >> > >to say anything about upgrading of URI scheme registrations.
> > >> >
> > >> > What URI schemes might need upgrade or not can be
> > deduced from the
> > >> > exact definition of what's an IRI, which explicitly
> > requires that
> > >> > the result of the IRI->URI conversion has to be a legal URI.
> > >>
> > >>Ok... I see... Section "3.1 Mapping IRIs toURIs"
> > >>
> > >><quote>
> > >>    This mapping has two purposes:
> > >>
> > >>    a) Syntactical:  Many URI schemes and components define
> > additional
> > >>       syntactical restrictions not captured in Section 2.2.
> > >>       Scheme-specific restrictions are applied to IRIs by
> > converting
> > >>       IRIs to URIs and checking the URIs against the
> > scheme-specific
> > >>       restrictions.
> > >></quote>
> > >>
> > >>So,,, an IRI is only a valid IRI *if* it 1) meets the
> > constraints of
> > >>the generic IRI syntax, and 2) when mapped to a URI using
> > the mapping
> > >>specified in section 3.1 it results in a URI that is valid URI that
> > >>meets the constraints of the generic URI syntax and meets any other
> > >>scheme specific syntactic constraints.
> > >>
> > >>This retains a central role for URI in determining the
> > validity of IRI
> > >>-
> > >
> > >Yes. But the intent of this is only to restrict IRI syntax to
> > >scheme-specific URI syntax as far as URI syntax is restricted by
> > >scheme-specific syntax. There are many instances of URI processing
> > >where there is no check whatsoever to make sure that the URI syntax
> > >conforms to scheme-specific restrictions. In such cases, I
> > don't expect
> > >the equivalent IRI implementations to check against scheme- specific
> > >syntax. If the current spec gives a different impression, then that
> > >would have to be fixed. But I hope there is no need;
> > 'scheme-specific
> > >restrictions are applied' implies that this is done if and when
> > >somebody wants to apply scheme-specific restrictions.
> > >[and of course implementations can also do the equivalent of
> > >translating scheme-specific URI syntax into scheme-specific
> > IRI syntax
> > >if they want to and that doesn't change external behavior].
> > >
> > >
> > >>and
> > >>any operationalised use of the relevant URI scheme to access the
> > >>resource can *only* be applied once the IRI has been transformed to
> > >>the corresponding URI..
> > >
> > >No, that's not true. It's true for most if not all protocols at the
> > >moment, but it's not true for future protocols. Although
> > this of course
> > >may never happen, it is very easy to immagine something like
> > HTTP/2.0
> > >or HTTP/1.2, where URIs can use UTF-8. There would be absolutely no
> > >need for a transformation to URIs for the protocol, and
> > there would be
> > >only a small amount of work, if at all, to upgrade some
> > servers (e.g.
> > >Apache; for others, that may or may not apply).
> > >Servers would then have to take care of the requirement that
> > %-encoding
> > >is only syntactic sugar, but they deal with that anyway
> > today already
> > >for the US-ASCII range.
> > >
> > >
> > >> >
> > >> > >The identifier http://www.w3.org/People/d?st
> > >> >
> > >> > [sorry, my Japanese mailer will have garbled that]
> > >> >
> > >> >
> > >> > >may be admissable under the
> > >> > >generic IRI syntax, but is it a valid HTTP scheme IRI?
> > And if so...
> > >> > >what specification makes it admissable as an HTTP scheme IRI?
> > >> >
> > >> > The IRI spec. If you take the above, and convert it to
> > an URI, you
> > >> > will get http://www.w3.org/People/d%C3%BCrst. The HTTP URI spec
> > >> > says that this is a legal URI, so the one above is a legal
> > >>IRI.
> > >> > In this case, it's not only legal, it's actually dereferencable,
> > >> > although the content at that location isn't terribly up to date,
> > >> > and the exact URI would be http://www.w3.org/People/D%C3%BCrst,
> > >> > but the server takes care of the casing issue.
> > >> >
> > >> > You will also observe that if you put the URI in the
> > address bar in
> > >> > Opera, you'll get back the IRI. Other browsers may do something
> > >> > similar, or may at least allow you to put in the IRI and
> > get to the
> > >> > actual page. (you have to be careful in the above
> > example because I
> > >> > also put in some redirects, e.g.
> > >> > for http://www.w3.org/People/d%FCrst,
> > >
> > >Sorry, that should have been http://www.w3.org/People/D%FCrst;
> > >redirects and case-insensitivity don't combine in our setup.
> > >
> > >>the Latin-1-encoded
> > >> > version, but you'll see when that happens because the
> > redirects are
> > >> > explicitly taking time.
> > >> >
> > >> >
> > >> > >Simply, my question is... what is the transition plan
> > for scheme
> > >> > >registrations wrt to IRI deployment?
> > >> >
> > >> > For many if not most, there is no need for a plan.
> > >>
> > >>Because any operationalised use of the IRI to access a resource
> > >>requires mapping to a valid URI and then the use of that URI for
> > >>resource access - so the sense of scheme is only really
> > applicable to the resulting URI.
> > >
> > >There is no actual mapping necessary, as I explained above,
> > but you are
> > >right in the sense that there are no independent IRI schemes. Of
> > >course, as Ted has pointed out, that doesn't mean that
> > suddenly, IRIs
> > >will be allowed everywhere by some magic.
> > >
> > >
> > >> > For some,
> > >> > such as http, it's mostly an issue of how people set up their
> > >> > servers. For some, such as mailto:, some work may be
> > appropriate,
> > >> > but in that specific case, there have been quite a few
> > discussions
> > >> > about weather and how to internationalize the left hand
> > side of an
> > >> > email address, and that discussion hasn't yet been conclusive.
> > >> > It seemed better to wait to see where that would lead before
> > >> > upgrading mailto:.
> > >> > For newly created URIs, if they follow the guidelines
> > for new URI
> > >> > schemes, they'll work with IRIs automatically.
> > >> >
> > >> >
> > >> > >Apologies if you have answered this before... I have
> > looked, but
> > >> > >did not find anything relevant.
> > >> >
> > >> > In the IRI spec, please look at 'applicability' very
> > early on, and
> > >> > then at the prose in the sections on syntax and on
> > >> > IRI->URI mapping. There is no such thing as e.g. a 'catalog
> > >> > of schemes that need upgrading', because after all, the
> > IRI draft
> > >> > is a generic document.
> > >>
> > >>On the basis that the (existing) schemes themselves are only really
> > >>applicable to URI arising from the IRI->URI mapping then I think I
> > >>agree that the adoption of the IRI spec induces no
> > maintenance burden
> > >>on those existing schemes.
> > >>
> > >>Also, the advice in RFC2718 is focussed on new *URI* schemes (well
> > >>"URL schemes" by virtue of it’s title)  presumably to make it more
> > >>likely that the IRI->URI mapping generates a valid URI (hence
> > >>validating the IRI), So, as Ted Hardie points out there is
> > no notion
> > >>of an IRI scheme (my fault for not understanding) there are
> > only URI schemes.
> > >>
> > >>FWIW: I think that it would be worth explaining the lack of need to
> > >>maintain/transition URI schemes for use with IRIs in the section on
> > >>upgrade strategy - I think that the argument is fairly
> > subtle and not
> > >>called out elsewhere in the draft.
> > >
> > >I agree that I could add something there.
> > >
> > >
> > >>BTW I suspect that there will be quite a few invalid IRI in common
> > >>usage, because people may omit the validtity check of
> > mappint to a URI
> > >>and checking that any scheme induced constraints have also been met.
> > >
> > >As I explained above, this is similar to what happens now with URIs.
> > >There are quite a few invalid URIs out there, aren't there?
> > And in some
> > >cases, they even work (in the sense of dereference), despite of what
> > >the spec says.
> > >Also, no XSLT implementation I know for example checks
> > namespace URIs
> > >for validity against scheme definitions. This may be the
> > same for RDF
> > >implementions, I would expect.
> > >
> > >Regards,    Martin.
> >
Received on Wednesday, 18 August 2004 07:03:21 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 30 April 2012 19:51:53 GMT