RE: URI schemes and IRI deployment (issue schemes-iri-38) from Williams, Stuart (HP Labs, Bristol) on 2004-09-16 (public-iri@w3.org from September 2004)

From: Williams, Stuart (HP Labs, Bristol) <skw@hp.com>
Date: Thu, 16 Sep 2004 18:22:37 +0900
To: public-iri@w3.org
Message-Id: <4.2.0.58.J.20040916182229.05404860@localhost>
Hello Martin,

 > I have tentatively closed this issue. Please see
 >
http://www.w3.org/International/iri-edit/diff-duerst-iri-last-draft.html
 > for the overall changes, and tell me whether you are okay, as
 > soon as possible.

Thank you for making the changes, they do address my concern.

A few short comments below on the URI upgrading para, but no requests
for further change.

By all means close the issue.

Best regards

Stuart
--

 > -----Original Message-----
 > From: Martin Duerst [mailto:duerst@w3.org]
 > Sent: 16 September 2004 08:01
 >
 > Hello Stuart,
 >
 > Sorry for the delay in responding to your mail.

Not a problem...
 >
 > At 10:49 04/08/19 +0100, Williams, Stuart wrote:
 >
 > >Hello Martin,
 > >
 > > > -----Original Message-----
 > > > From: public-iri-request@w3.org
 > > > [mailto:public-iri-request@w3.org] On Behalf Of Martin Duerst
 > > > Sent: 18 August 2004 08:03
 > > > To: Williams, Stuart
 > > > Cc: public-iri@w3.org; Ted Hardie
 > > > Subject: RE: URI schemes and IRI deployment (issue schemes-iri-38)
 > > >
 > > >
 > >
 > ><snip/>
 >
 > > > Okay, it looks like I wasn't precise enough. Let me try a proposal

 > > > for rewording, for the middle sentence in the paragraph above:
 > > >
 > > > "The main case where upgrading a scheme definition makes sense is
 > > > when a scheme definition is strictly limited to the use of
US-ASCII
 > > > characters with no provision to include non-ASCII
characters/octets
 > > > via percent-encoding, or if a scheme definition currently uses
 > > > highly scheme-specific provisions for the encoding of non-ASCII
 > > > characters."
 > > >
 > > > Would that be better? Would the changes below still be necessary?
 > > > I wouldn't want to replace the above with your text below, because

 > > > your text below says nothing about schemes that may or may not
have
 > > > to be upgraded.
 > >
 > >Hmmm... if you were to include the reworded para below (I've agreed
to
 > >the rewording - fewer  'generally's) I think you could simply delete
 > >this paragraph.
 >
 > I have thought about that. I think the current paragraph,
 > talking about upgrades, is valuable in its own right,
 > although it is not the issue you have raised. So I'll leave that in.

Ok... some comments below... but I can live with the para.

 > >On the surface it is ok, but if I were to say, think of upgrading an
 > >existing scheme and saying that going forward, %-encoded characters
 > >should be interpreted as UTF-8 I find myself wondering about backward

 > >compatibility issues, where %-encoding may have been used in
 > >identifiers without that intended interpretation. I'm not at all sure

 > >how possible it is to 'upgrade' any URI scheme.
 >
 > If it was the case that %-encoding was used with a fixed
 > character semantics that is different from UTF-8 (let's take
 > iso-8859-1 as an example), then you are right [I don't know
 > of such a scheme, but that doesn't mean that it might not
 > exist.]. In practice, it may still be possible to add such
 > semantics for newly created URIs because there are very good
 > heuristics for UTF-8.

I thinks it's more common that the character set/encoding is just not
known.

 > Also, if %-encoding was used without any defined character
 > semantics (typical example: HTTP), then it would be
 > impossible to force UTF-8 character semantics on %-encoding.
 > Again, in practice, a scheme definition may be updated to say
 > something like 'if it looks like UTF-8, assume it's UTF-8'.

Personnally I'd be conservative on both these counts... if the scheme
doesn't give you a mechanism to know the character set/encoding, don't
guess.

 > Anyway, that's why the text in the draft is very careful to
 > limit this to the case where a scheme (or a part thereoff)
 > does not allow %-encoding, or uses other conventions for encoding
non-ASCII characters.
 > In these cases, %-encoding is essentially added as new syntax
 > to the scheme. The benefits of extending the syntax of a
 > scheme have to be judged carefully, but it's not something
 > that is a priory impossible.

Ok, I can now see the intent behind the

	"...strictly limited to the use of US-ASCII
	characters with no provision to include non-ASCII
characters/octets
	via percent-encoding,.."

That seems to me like to be a small set of schemes too. Do any actually
prohibit the use of %-encoding?

 > > > "URI schemes can impose restrictions on the syntax of
 > > > scheme-specific URIs, ie. URIs that are admissable under the
generic
 > > > URI syntax [RFCYYYY] may not be admissable due to narrower
syntactic
 > > > constraints imposed by a URI scheme specification. URI scheme
 > > > definitions cannot broaden the syntactic restrictions of the
generic
 > > > URI syntax, otherwise it would be possible to generate URIs that
 > > > satisfied the scheme specific syntactic constraints without
 > > > satisfying the syntactic constraints of the generic URI syntax.
 > > > However, additional syntactic constraints imposed by URI scheme
 > > > specifications are *indirectly* applicable to IRI since the
 > > > corresponding URI resulting from the mapping defined in Section
3.1
 > > > MUST be a valid URI under the syntactic restrictions of generic
URI
 > > > syntax and any narrower restrictions imposed by the corresponding
 > > > URI scheme specification."
 > >
 > >Inclusion of this paragraph, as reworded above, would
 > address my concern.
 >
 > I have included this paragraph. I think this is material that
 > should end up in the 'guidelines for new URI schemes'
 > or whatever it will be called, and once it end up there, we
 > may be able to remove it from here, but for the moment, it
 > doesn't hurt.
 >
 >
 > >Well... I think it needs to be clear to readers of the IRI spec that
no
 > >magic happens that automatically enables them to create schemes that
 > >allow the *direct* inclusion of a wider range of characters in scheme
definitions.
 > >I made my initial comment after a discussion with Tim Kindberg wrt to
the
 > >tag: URI scheme in draft. He was confused about what he could/could
not
 > >do wrt to internationalisation on defining that scheme. For his
 > >purposes he would (I believe) like to be able to allow the direct use

 > >of internationalized characters, and the %encoding. Passed around as
 > >IRI Tim would get what he wants (provided me makes appropiate
 > >statements/references about %encoding and UTF-8).
 >
 > I agree. Your pointer to Tim's draft helped me a lot
 > understanding what you were looking for.
 >
 > I have tentatively closed this issue. Please see
 > http://www.w3.org/International/iri-edit/diff-duerst-iri-last-
 > draft.html
 > for the overall changes, and tell me whether you are okay, as
 > soon as possible.
 >
 > Regards,     Martin.
 >
 >
Received on Thursday, 16 September 2004 09:22:56 UTC