Re: RFC 2396 revision issue: Query definition from Hrvoje Simic on 2002-11-14 (uri@w3.org from November 2002)

From: Hrvoje Simic <hrvoje.simic@zg.hinet.hr>
Date: Fri, 15 Nov 2002 00:13:04 +0100
To: <uri@w3.org>
Message-ID: <000001c28c33$62f7e870$03000a0a@selo>
I agree with most of the things said so far. I would like to break up
the discussion into several issues.


1) Should the query component be redefined, and how?

Yes, but it's hard to think up a good definition. In the "classic" Web,
it was the parameters you passed to the program found in a file on a
computer using a protocol. Now these concepts of protocol, computer,
file path and parameters are much more abstract. Should it be
"http://about.example.org" or "http://example.org/about"?
"/messages/1-10" or "/messages?from=1&to=10"? Are there any "hard
semantic" reasons for preferring one solution over the other, or just
guidelines? Evolution of URI towards an abstract identifier blurred the
differences between its components. Path is effectively defined for URIs
"hierarchical in nature", which sounds like a guideline.

Query may be left opaque and abstract, something like: "URI component of
arbitrary syntax left for server-specific purposes". Or we may crack it
open and come to the next issue:


2) Should the definition include details about the query structure (like
it did for the path)?

I see that almost every message in this thread mentions query structure.
But RFC 2396 and RFC 2616 (defining http-URI) don't include such
details. My name for the parts of the query (separated with ampersands
or semicolons) is "query segments" - just to make query sound more like
the path.

I agree that the query should preserve the order of its segments. The
order may matter to the specific server. Anyway, the segments must be
listed in _some_ order, and I see no advantage in allowing the network
to shuffle them. What I really meant was: path segments must be parsed
in the fixed order, from left to right. If you have "a/b/c" you parse
"a" to identify the branch in the next level of hierarchy and you hand
over "b/c" to it. But if you have "?a;b;c" you can look for a "b" and
then continue to parse the "?a;c". This allows clients to communicate
information about resource's identity that isn't naturally placed in the
hierarchy, i.e. that doesn't fit nicely in a sequence of steps through
the hierarchy.


There are several more related issues bugging me that I'll just mention:

3) Should the semicolon be preferred as a query segments separator
instead of the popular ampersand? Semicolons are more natural and they
make the URI more readable. And HTML 4 said so in [1]. Ironic, really,
since it raised ampersand to throne with its form submission procedure
[2].

4) Why not relative query? I really think they could be useful.

5) Why bother with the query anyway? After reading Jeff Bone's articles
like Query Strings Considered Harmful. [3]

Hrvoje Simic
FER, University of Zagreb, Croatia
mailto:hrvoje.simic@fer.hr
mailto:hrvoje.simic@zg.hinet.hr


[1] http://www.w3.org/TR/html401/appendix/notes.html#h-B.2.2
[2] http://www.w3.org/TR/html4/interact/forms.html#h-17.13.4
[3] http://conveyor.com/RESTwiki/moin.cgi/QueryStringsConsideredHarmful
Received on Thursday, 14 November 2002 18:11:48 UTC