W3C home > Mailing lists > Public > public-iri@w3.org > August 2004

RE: URI schemes and IRI deployment (issue schemes-iri-38)

From: Williams, Stuart <skw@hp.com>
Date: Thu, 19 Aug 2004 10:49:48 +0100
Message-ID: <E864E95CB35C1C46B72FEA0626A2E8080190054A@0-mail-br1.hpl.hp.com>
To: "'Martin Duerst'" <duerst@w3.org>
Cc: public-iri@w3.org, Ted Hardie <hardie@qualcomm.com>

Hello Martin,

> -----Original Message-----
> From: public-iri-request@w3.org 
> [mailto:public-iri-request@w3.org] On Behalf Of Martin Duerst
> Sent: 18 August 2004 08:03
> To: Williams, Stuart
> Cc: public-iri@w3.org; Ted Hardie
> Subject: RE: URI schemes and IRI deployment (issue schemes-iri-38)
> 
> 

<snip/>

> > > This means that for most URI schemes, there is no need to upgrade 
> > > their scheme definition in order for them to work with IRIs. The 
> > > main case where upgrading a scheme definition may make sense is when 
> > > a scheme definition is limited to the use of US-ASCII characters 
> > > with no provision to include non-ASCII characters/octets but a 
> > > desire to include such characters, or only with provisions that are 
> > > highly scheme-specific. An example of such a scheme might be the
> > > mailto: scheme [RFC2368].
> >
> >But is it not the case that since all URI schemes have a narrowing 
> >effect on generic *URI* syntax thus it is a the case that all URI 
> >schemes are "limited to the the use of [a subset of] US-ASCII 
> >characters/octets", the use of %encoding allowing the 'indirect' use of 
> >non-US-ASCII. in what then become visually unpleasant URI..
> >
> >ie. stated this way round... the "main case" is in fact the case for 
> >all URI schemes.
> 
> Okay, it looks like I wasn't precise enough. Let me try a 
> proposal for rewording, for the middle sentence in the 
> paragraph above:
> 
> "The main case where upgrading a scheme definition makes 
> sense is when a scheme definition is strictly limited to the 
> use of US-ASCII characters with no provision to include 
> non-ASCII characters/octets via percent-encoding, or if a 
> scheme definition currently uses highly scheme-specific 
> provisions for the encoding of non-ASCII characters."
> 
> Would that be better? Would the changes below still be necessary?
> I wouldn't want to replace the above with your text below, 
> because your text below says nothing about schemes that may 
> or may not have to be upgraded.

Hmmm... if you were to include the reworded para below (I've agreed to the
rewording - fewer  'generally's) I think you could simply delete this
paragraph. On the surface it is ok, but if I were to say, think of upgrading
an existing scheme and saying that going forward, %-encoded characters
should be interpreted as UTF-8 I find myself wondering about backward
compatibility issues, where %-encoding may have been used in identifiers
without that intended interpretation. I'm not at all sure how possible it is
to 'upgrade' any URI scheme.

> >I would prefer to see a sentence or two that gave a more direct account 
> >of the applicability of URI schemes to IRI and have drafted the following
> >replacement:
> >
> >"In general URI schemes can impose narrowing restrictions on the syntax 
> >of scheme-specific URI, ie. URI that are admissable under the generic 
> >URI syntax [RFC2396bis]  may not be admissable due to narrower 
> >syntactic constraints imposed by a URI scheme specification.  In general 
> >URI scheme definitions cannot broaden the syntactic restrictions 
> >imposed by the generic URI syntax, otherwise it would be possible to 
> >generate URI that satisfied the scheme specific syntactic constraints 
> >without satisfying the syntactic constraints of the generic URI syntax. 
> >However, additional syntactic constraints imposed by URI scheme 
> >specifications are *indirectly* applicable to IRI since the 
> >corresponding URI resulting from the mapping defined in Section 3.1 
> >MUST be valid a URI under the syntactic restrictions of generic URI 
> >syntax and any narrower restrictions imposed by the 
> corresponding URI scheme specification."
> 
> There is a bit too much 'in general' for my taste, and some 
> other minor wording issues (plural of URI is URIs). So what about:
> 
> "URI schemes can impose restrictions on the syntax of 
> scheme-specific URIs, ie. URIs that are admissable under the 
> generic URI syntax [RFCYYYY] may not be admissable due to 
> narrower syntactic constraints imposed by a URI scheme 
> specification. URI scheme definitions cannot broaden the 
> syntactic restrictions of the generic URI syntax, otherwise 
> it would be possible to generate URIs that satisfied the 
> scheme specific syntactic constraints without satisfying the 
> syntactic constraints of the generic URI syntax. However, 
> additional syntactic constraints imposed by URI scheme 
> specifications are *indirectly* applicable to IRI since the 
> corresponding URI resulting from the mapping defined in 
> Section 3.1 MUST be a valid URI under the syntactic 
> restrictions of generic URI syntax and any narrower 
> restrictions imposed by the corresponding URI scheme 
> specification."

Inclusion of this paragraph, as reworded above, would address my concern.

> > > This specification does not upgrade any scheme specifications in any 
> > > way, this has to be done separately. Also, it should be noted that 
> > > there is no such thing as an "IRI scheme"; all IRIs use URI schemes, 
> > > and all URI schemes can be used with IRIs, even though in some cases 
> > > only by using URIs directly as IRIs, without any conversion.
> >
> >"The publication of this specification has no effect on existing (or 
> >future) URI scheme specifications."
> 
> Are you proposing to replace the above paragraph with your sentence?

Sorry for not being clear, yes that was what I was proposing.

> Or adding your sentence? 

No... I was proposing replacement.

> In any case, I would not like the '(or future)'
> part. While future scheme definitions won't suddenly add 
> actual non-ASCII characters to their syntax, they may very 
> much be designed to accomodate these characters through UTF-8 
> and percent-encoding, and so saying that this spec doesn't 
> affect future specs would be wrong.

Ok... the main/essential point that I was trying to capture is that it is
not possible for a URI scheme specification to allow the *direct* inclusion
of a broader range of characters than is admissable by generic URI syntax
(hence the "or future").  
 
> >I think that is is important to  note (as in the above) that it is not 
> >possible to write URI scheme specifications that are less restrictive 
> >than generic URI syntax.
> 
> That's a good point. But wouldn't that belong in the URI 
> spec, rather than the IRI spec? Actually, in my 
> understanding, it's already in the URI spec, and the IRI spec 
> does nothing to change that (nor could it).

Well... I think it needs to be clear to readers of the IRI spec that no
magic happens that automatically enables them to create schemes that allow
the *direct* inclusion of a wider range of characters in scheme definitions.
I made my initial comment after a discussion with Tim Kindberg wrt to the
tag: URI scheme in draft. He was confused about what he could/could not do
wrt to internationalisation on defining that scheme. For his purposes he
would (I believe) like to be able to allow the direct use of
internationalized characters, and the %encoding. Passed around as IRI Tim
would get what he wants (provided me makes appropiate statements/references
about %encoding and UTF-8).

> Regards,     Martin.
> 
> 
> >If you're willing/able to
> >make changes along the lines I suggest above I think it would more 
> >clearly state the applicability of URI schemes in an IRI context
> >
> >
> >Many thanks,
> >
> >Stuart
> >--

Thanks,

Stuart
--
<snip/>
Received on Thursday, 19 August 2004 09:50:21 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 30 April 2012 19:51:53 GMT