Re: revised "generic syntax" internet draft

Martin J. Duerst (
Mon, 14 Apr 1997 16:21:18 +0200 (MET DST)

Date: Mon, 14 Apr 1997 16:21:18 +0200 (MET DST)
From: "Martin J. Duerst" <>
To: "Roy T. Fielding" <fielding@kiwi.ICS.UCI.EDU>
Subject: Re: revised "generic syntax" internet draft
In-Reply-To: <>
Message-Id: <Pine.SUN.3.96.970414141259.245C-100000@enoshima>

On Fri, 11 Apr 1997, Roy T. Fielding wrote:

> >I reiterate that there is consensus on integrating text
> >for UTF-8 as the recomended character encoding into the
> >draft.
> That is a lie.

Thanks for being explicit. Whether we have rough consensus
to include the text for UTF-8 or not may be an open question
in the absence of a group chair, but it should be very
clear that we definitely have no consensus to exclude
UTF-8 from the draft, as has been claimed.

> >We have a proposed wording, two paragraphs which
> >I don't think I need to repeat.
> That is true.  I wrote that wording because your prior wording was
> too confusing, not because I agreed with it.

Again, thanks for being clear. From your communication up to now,
I had to assume that you wrote that wording because after I had
pointed out to you that UTF-8 was only *recommended*, not requested,
made you turn your disagreement into agreement.

Immediately after I received that wording, I sent a mail to the
list saying how much I liked your wording and that we should go
with it. This was 5 weeks ago. You should have received two
copies of that mail, and it should have been rather obvious that
I was assuming you agreed to your own wording. This is the first
time I see anything to the contrary.

> >I have only heard very
> >general arguments against this wording, arguments which
> >I have showed to be untrue or irrelevant.
> That is a lie.  You have an opinion, Martin, and Larry has an opinion,
> and I have an opinion.

We all have our oppinions and tastes. I respect your oppinions,
and I respect Larry's oppinions. But there is a difference between
an oppinion and an argument.

You hold the oppinion that URLs shouldn't contain anything else
than ASCII. As your argument, you gave typability.
I hold a different oppinion. I showed that with current technology,
typability is no longer an argument (a missing local keyboard
resource can be replaced by a Java applet), that many actual
resource names (e.g. numbers on car number plates) contain
characters beyond ASCII anyway, so that restricting URLs
to ASCII for typability doesn't help anybody, and that
in the end requiring URLs to be ASCII only is rather
similar to requesting the web to give up GIFs and other
images just because there are some people that can't see.

> You did not show any of my arguments to be
> untrue or irrelevent,

See above. If you have any arguments to the above, I would
be very interested to hear them.

> and the only thing you have demonstrated is that
> you think URLs should be treated as filenames.  Well, I disagree.

I never demonstrated that I think URLs should be treated as
filenames, because I couldn't possibly do so. I do NOT think
URLs should be treated as file names.

It is true that I have in many cases used filenames as examples
of URLs. In particular, I have spoken about filenames when you
raised the concern that implementing the UTF-8 recommendation
would not be easy to do on certain file systems.

> The only question that matters is whether or not the draft as it
> currently exists is a valid representation of what the existing
> practice is and what the vendor community agrees is needed in the
> future to support interoperability.

As long as we are at IETF, what matters is the discussion here.
If we decide that we have to restart at Draft Standard because
otherwise some problems cannot be solved, then the criteria
of course become different.

As for "vendor community", we have heard clearly positive
voices on this list from people from Sun and from Alis for
the UTF-8 proposal. I did not see any negative voices from
vendors. And I know from many other vendors that they would
be happy to know how to encode all kinds of characters into
URLs and how to decode the characters from the URLs, and
that they look forward to the UTF-8 proposal being accepted.

> I have yet to hear *any* support
> for your additional requirements from the vendor community,

Francois Yergeau, from Alis (a browser vendor), has been
very explicit about this.

> and I
> know for a fact that they do not correspond to any existing plans
> of the Apache Group.

The Apache Group is a group of volunteers doing very nice work.
And you are a core member of this group, so you will know.
Up to now, nobody in that groups seems too much concerned with
internationalization work, although Dirk van Gulik has given
an excellent presentation on content negotiation for document
encoding ("Accept-Charset") and document language ("Accept-Language").

In the last few days, I have had a closer look at the current
Apache sources. My discovery of the rewrite module and of the
concept of sub-requests has made me more and more inclined to
volunteer to write a module that can handle various configuration
cases for UTF-8 (per-server and per-directory native resource
names, various upgrading strategies,...). My main question at
the moment is not whether this is technically feasible, but
whether I will get some advice by experienced Apache people.

> Since it is my opinion that it is NEVER desirable
> to show a URL in the unencoded form given in Francois' examples,
> you cannot claim to hold anything even remotely like consensus. 
> In fact, the "rough consensus" of the HTTP development
> community is that the URL namespace belongs to the origin of the name,
> and no client has the right or need to reinterpret that name for
> the purpose of display. That is what the current draft says,

URLs don't belong to the HTTP development community. As for "right",
it more or less burns down to the question of what you call an URL
and what you call a presentation of an URL. As for "need", it's
not the technical community that is deciding this.

> and it
> does so in a way that DOES NOT PREVENT any future use of URLs to be
> of a single character set encoding.

We are not discussing here about "NOT PREVENT"ing. We are discussing

> IF you can persuade the creators of URLs to always use UTF-8,
	Do you forget so quickly that it is only a *recommendation*?

> which
> is definitely not the case today (Apache, NCSA, and CERN servers all
> use whatever charset is used by the underlying filesystem, which on
> most Unix-based systems is iso-8859-1 or iso-2022-*),

	The overwhelming majority of filesystems in places that could
	use iso-2022-* (Japan,...) don't use that; they use EUC.
	Encodings such as iso-2022-jp are used only in email.

> then you can
> make claims of consensus.  Until then, your opinions have been answered
> to the best extent possible by the editors, and with far more
> civility than in your responses.

Please answer my arguments. My opinions are irrelevant.

Civility is a (secondary) issue. You seem to consider shouting
"That is a lie." when you should know better to be civil.
I have the tendency to be less direct, which is considered more
civil in many societies. This is internationalization at work :-).

Regards,	Martin.