Choosing and comparing URI

Meta: TAG underutilizing presentation medium?

arch principles must be taught, not just written/read


<h1>Choosing and comparing URI</h1>

1. If you mean the same thing, say it the same way.

2. When choosing names for distinct things, choose clearly distinct names

3. Absolute URIs* are the basis of comparison

* w/optional fragments

4. Clients/consumers should not usurp servers'/providers' naming rights

<h1>If you mean the same thing, say it the same way.</h1>

Be aware that other names may refer to the same
thing, but don't rely on that unnecessarily.

Yes, http://example.net:80/ is specified to mean
the same thing as http://example.net/ but don't
rely on all consumers being that smart.
e.g. XPath, RDF, other stuff based on namespaces.

<h1>Providers: choose clearly distinct names for distinct things</h1>

Case disappears in DNS: you cannot serve different
responses to GETs on http://EXAMPLE.net/foo and http://EXAMPLE.net/ .

It's unwise to rely on the distinction between
http://example.net/FOO and http://example.net/foo:

* you may want to port your content to a server infrastructure
where those are synonymous.

* Distinguishing them when reading over the phone is
error-prone.

Likewise, be ware of letters l and O versus digits 1 and 0.

<h1>Comparison Relationships</h1>

Premise: compare u1 and u2, and use the result as
information about what they identify, i.e. r1 and r2.

false negative: If r1 = r2 but not R(u1, u2) we
say R gives some false negatives.

false positive: If R(u1, u2) but r1 <> r2, we
say R gives some false positives.

<h1>Partial Knowledge is the norm</h1>

false negatives are a normal consequence of the
ordinary situation where local knowledge is incomplete

<h1>Absolute URIs* are the basis of comparison</h1>

Axiom: u1 = u2 => r1 = r2.

for absolute URIs (optionally with fragids) u1 and u2,
where u1 = u2 is strcmp(b1, b2) where b1 is the
US-ASCII encoding of u1 and b2 likewise for u2.

For IRIs, analagously: ecmascript/java/C# string
compare[@@double-check]
[@@what is this called in the charmod spec?].

<h1>Absolute URIs* are the basis of comparison (2)</h1>

Providers: consider relative references
 In http://example/abc , abbreviate
    http://example/def by a relative
 URI reference, "def".

* Allows references within copies of the
content to maintain locality of reference

* allows to compose in file: space, export to http: space

<h1>Absolute URIs* are the basis of comparison (3)</h1>

Consumers: expand references to absolute form
before comparing**.

otherwise, both false positives:
  "foo" in http://example/dir1/ refers to r1
  "foo" in http://example/dir2/ refers to r2
   "foo" = "foo" but r1 <> r2.
  needed to compare http://example/dir1/foo with http://example/dir2/foo

and false negatives:
  "../foo" in http://example/dir1/dir3/ refers to r1
  "foo" in http://example/dir1/ refers to r1
   "../foo" <> "foo" but r1 = r1.
  needed to compare http://example/dir1/foo with http://example/dir1/foo
This false negative is avoidable using only local knowledge.

** or at least act like you did.

<h1>Information hiding: Clients should not usurp server's naming rights</h1>

A server <b>may</b> use /FOO/ and /foo/ as pathnames for
distinct resources. Likewise /foo/ and /foo/index.html.

While many servers use /foo/ and /foo/index.html to name the same*
resource, and hence treating them as synonyms is often a useful
optimization, servers may use them to name distinct resources, and the
responsibility of the false positive lies with the optimizing client.

*or at least: indistinguishable

<h1>Well-known pathnames are counter to information-hiding</h1>

/robots.txt constrains providers choice of names.
This constraint is largely accepted as fair,
deployed with due process.
(@@but is it underdocumented? Does it belong
in the HTTP spec as notice that it's a reserved pathname?)

what about /w3c/p3p.xml?

what about favico?

Let's be careful about others.

<h1>layered conventions reveal information</h1>

A form reveals information about a whole set
of names of the form aaa?n1=v1&n2=v2...
where aaa is the action URI and n1... are
the form field names.

Usually, this is voluntary by the info provider,
and assumed accurate by consumers/clients.

There is an issue of trust: the cost of
forgery is that clients will send requests
with expectations that the server hasn't
agreed to meet. For GET, no big deal;
but for POST, could be serious.
(@@elaborate? is this security risk documented?)