Meta: TAG underutilizing presentation medium? arch principles must be taught, not just written/read

Choosing and comparing URI

1. If you mean the same thing, say it the same way. 2. When choosing names for distinct things, choose clearly distinct names 3. Absolute URIs* are the basis of comparison * w/optional fragments 4. Clients/consumers should not usurp servers'/providers' naming rights

If you mean the same thing, say it the same way.

Be aware that other names may refer to the same thing, but don't rely on that unnecessarily. Yes, http://example.net:80/ is specified to mean the same thing as http://example.net/ but don't rely on all consumers being that smart. e.g. XPath, RDF, other stuff based on namespaces.

Providers: choose clearly distinct names for distinct things

Case disappears in DNS: you cannot serve different responses to GETs on http://EXAMPLE.net/foo and http://EXAMPLE.net/ . It's unwise to rely on the distinction between http://example.net/FOO and http://example.net/foo: * you may want to port your content to a server infrastructure where those are synonymous. * Distinguishing them when reading over the phone is error-prone. Likewise, be ware of letters l and O versus digits 1 and 0.

Comparison Relationships

Premise: compare u1 and u2, and use the result as information about what they identify, i.e. r1 and r2. false negative: If r1 = r2 but not R(u1, u2) we say R gives some false negatives. false positive: If R(u1, u2) but r1 <> r2, we say R gives some false positives.

Partial Knowledge is the norm

false negatives are a normal consequence of the ordinary situation where local knowledge is incomplete

Absolute URIs* are the basis of comparison

Axiom: u1 = u2 => r1 = r2. for absolute URIs (optionally with fragids) u1 and u2, where u1 = u2 is strcmp(b1, b2) where b1 is the US-ASCII encoding of u1 and b2 likewise for u2. For IRIs, analagously: ecmascript/java/C# string compare[@@double-check] [@@what is this called in the charmod spec?].

Absolute URIs* are the basis of comparison (2)

Providers: consider relative references In http://example/abc , abbreviate http://example/def by a relative URI reference, "def". * Allows references within copies of the content to maintain locality of reference * allows to compose in file: space, export to http: space

Absolute URIs* are the basis of comparison (3)

Consumers: expand references to absolute form before comparing**. otherwise, both false positives: "foo" in http://example/dir1/ refers to r1 "foo" in http://example/dir2/ refers to r2 "foo" = "foo" but r1 <> r2. needed to compare http://example/dir1/foo with http://example/dir2/foo and false negatives: "../foo" in http://example/dir1/dir3/ refers to r1 "foo" in http://example/dir1/ refers to r1 "../foo" <> "foo" but r1 = r1. needed to compare http://example/dir1/foo with http://example/dir1/foo This false negative is avoidable using only local knowledge. ** or at least act like you did.

Information hiding: Clients should not usurp server's naming rights

A server may use /FOO/ and /foo/ as pathnames for distinct resources. Likewise /foo/ and /foo/index.html. While many servers use /foo/ and /foo/index.html to name the same* resource, and hence treating them as synonyms is often a useful optimization, servers may use them to name distinct resources, and the responsibility of the false positive lies with the optimizing client. *or at least: indistinguishable

Well-known pathnames are counter to information-hiding

/robots.txt constrains providers choice of names. This constraint is largely accepted as fair, deployed with due process. (@@but is it underdocumented? Does it belong in the HTTP spec as notice that it's a reserved pathname?) what about /w3c/p3p.xml? what about favico? Let's be careful about others.

layered conventions reveal information

A form reveals information about a whole set of names of the form aaa?n1=v1&n2=v2... where aaa is the action URI and n1... are the form field names. Usually, this is voluntary by the info provider, and assumed accurate by consumers/clients. There is an issue of trust: the cost of forgery is that clients will send requests with expectations that the server hasn't agreed to meet. For GET, no big deal; but for POST, could be serious. (@@elaborate? is this security risk documented?)