- From: Jim Gettys <jg@pa.dec.com>
- Date: Mon, 9 Feb 1998 11:22:56 -0800
- To: uri@Bunyip.Com
I hesitate to tread into this quagmire, but as I much prefer referencing
a URI document than having to duplicate material into the HTTP spec, which
I will have to do if the URI document debate is not settled quickly, I
now join the fray. IETF rules require draft standards only make normative
references to other draft standards, so I cannot use the proposed standard
URL documents in the HTTP document, so unless this is settled quickly,
I won't have any option....
I did not have a strong opinion in the past, but I've been forced to
form an opinion, and somewhat to my surprise, I now have a strong opinion.
I hope the material below makes the case well; I am very pressed for time
to complete the HTTP spec soon, and have a deadline (or maybe liveline
is a better term) looming that will limit the debating time I have
available. I will try to get a detailed critique of what I believe should
change in Roy's draft in the next day or two.
- Jim Gettys
Introduction
------------
The importance of sharing particular pieces of URI syntax has never been
well understood or documented. Most URI design has been based on existing
practice, and usually shares most of the generic syntax; there have been
exceptions. Both intuition and reduction of code required to implement
new URI schemes have encouraged significant uniformity of design, but
recent understanding, and I hope this document, shows that it is vital
to share as much syntax between URI schemes.
Without understanding of the consequences of particular choices, however,
it has been unclear to designers of new URI schemes if a particular piece
of syntax is appropriate to their application, and the consequences of
not sharing a particular piece of syntax has not been clear, so resulting
URI syntax design has sometimes been poor.
Both from a review of discussions on the URI mailing list as part of being
asked about the proposed URI syntax and semantics specification, and
as part of a meeting I recently attended, I've come to realize that there
are two quite subtle consequences to (not) sharing components of URI
syntax that have profound impact on the future evolutionary flexibility for
the World Wide Web.
Note: NOTHING I am saying is different for any URx.
Documenting these issues to guide those involved URI scheme design
has become vital to future World Wide Web evolution.
These have generically to do with:
o Constraits imposed by Content on the Web
o Constraints imposed by need for information hiding, to enable
software in the Web to remain modular and extensible.
There have been two views of URI syntax:
1) more or less ``anything goes'' after the colon
2) (more) ``uniform'' sharing of URI syntax to the extent it may make
sense
The fundamental problem has been to distinguish the merits of each
approach. The strongest arguments for "anything goes" have been
o the constraints of the syntax can make things difficult for scheme
designers
o existing syntax of identifiers can be adopted without further
thought, which are already in widespread use and familiar to those
who use them
The strongest arguments for the "uniform" I had previously seen, were:
o general simplicity
o fewer parsers to build
o and general design intution that uniformity is better than chaos
While I have generally preferred the "uniform" approach, I did not have
a strong opinion. If this document succeeds in its intent, however, you
will decide that the uniform approach is not only desirable, but vital
for long term Web architecture.
So what are the consequences of following each path?
View of URI syntax as a Class Hierarchy
---------------------------------------
One way of framing the discussion is to view URI syntax is as though it
were an object hierarchy. Then there are a set of methods that can be
applied to a URI string:
Scheme(URI);
Fragment(URI),
Relpath(URI), etc.
Note that not all methods might necessarily apply to a particular scheme
(analogous to an unimplemented method), and some schemes might
define additional methods (subclass) .
In these terms, the debate can be framed as:
o whether different URx's inherit from Object (``anything goes''),
o or if they inherit from a basic ``uniform'' URI syntax.
Class hierarchy design is known to be difficult! How do we evaluate the
choice?
Consequences of URI's Being Embedded in Content
-----------------------------------------------
The utility of embedding links into document is certainly now clear to
the world. But the fact that links are internally embedded into many
data types (e.g. HTML, XML, Microsoft Word, Adobe PDF, etc.) have
consequences. Note that below I mean "naming authority" to be scheme
specific delegatee of part of a name space; for example, the www.w3.org
in the URL http://www.w3.org/foo/bar/baz.html.
o If fragment syntax (to the extent of understanding the URI is a
fragment), isn't shared between two schemes, (e.g. ``<a
href=``#foo''>''), you can't move individual completely self
referential documents between schemes without rewriting the
document. In the Web, the fragment syntax is a property of the
media type, and evaluted by the client.
o If fragment syntax is not shared between different media types of
the same capability (e.g. HTML, XML, Word, or image types
such as GIF, JPEG, PNG) then you can't have a URI reference
that can evolve to superior media types as they become available,
or even likely work properly today with content negotiation.
o If relative syntax (to the extent of understanding the URI is
relative, and what part of the URI string is relative) isn't shared
between two schemes, (e.g. ``<a href=``foo''>''), you can't
move sets of documents that are internally self referential between
schemes without rewriting.
o If ".." syntax as a path component in relative URI's isn't shared
between schemes, you can't easily have sets of document sets and
refer to them between schemes without rewriting.
o If / syntax (to the extent of understanding that the URI refers to a
path relative to the current naming authority) isn't shared, you
can't have multiple sets of documents easily be moved up or
down in a relative heirarchy of names and share a common set of
documents between them, without rewriting the content, shared
either in that scheme or between schemes. The best example is a
site that has a common set of GIF's, JPEG and PNG images, and
you want to reorganize the site changing the depth of a subtree
from one depth to another, or from one directory to another
where the depth isn't the same.
o If naming authority syntax (e.g. what comes after "//" in most URL
schemes) and relative path syntax is shared, to the extent of
understanding that the URI has a naming authority, and what part
of the URI string is the naming authority vs. path), isn't shared
between two schemes, you can't share identical name spaces and
serve them up via different schemes. (The naming authority
syntax is a property of the scheme). The fact that HTTP, and FTP
have the same syntax, for example, has often been exploited by
sites transitioning from ftp archive service to HTTP archive
service so that the URL's can be identical between schemes
except for the scheme; the same content can be served via two
schemes simultaneously.
o If query syntax (to the extent of understanding the URI has a
query, and what part of the URI string is the query) isn't shared
between two schemes ( the syntax is a property of the server,
rather than the client).
o There are a few other pieces of URI path syntax for which this
document does not explore the consequences, but I think you can
work it out for yourself, given these examples.
Digital Signatures
------------------
Digital signatures on content will increase even further the importance
of maintaining bit-for bit integrity of content. Original signatures may
require a private key only available at the time of signing, and may or
may not be embedded into content in the same fashion as URI's. Therefore
as signature technology deploys, if syntax differs gratuitously between
schemes, it will strongly discourage old content being available via new
schemes that might be deployed.
Impact on Opacity of Interfaces
-------------------------------
o If fragment syntax is not solely media type dependent, (e.g.
depends on the scheme), then introducing a new scheme would
(potentially) require that each media viewer be updated for that
scheme. This is likely to be a prohibative amount of work.
o Similarly, to be able to introduce new schemes into the web,
without having to modify all URI access code in applications, the
URI parsing code in applications must be able to remove the
fragment from the base URI, or it will have to be updated for
each scheme.
o Relative URI parsing and following of links cannot also be
independent of scheme unless relative URI syntax is shared, and
similarly, user agent and other programs that follow relative links
would have to be updated for a new scheme to be introduced.
These examples show that unless syntax is shared, new schemes will be
very hard to introduce into the Web.
Conclusions
-----------
The sections above shows that the more sharing of basic URI syntax there
is, the more likely (a set of) complex objects can be transported unmodified
between different schemes (e.g. FTP to HTTP to HTTP/NG to URN, and to
other schemes). Similarly, content can evolve to more useful types without
breaking URI references, fragment syntax is shared among related content
types (e.g. named anchors in documents). Digital signatures on content
will further increase the importance of maintaining bit-for bit integrity
of content.
Some naming systems lack the semantic meanings covered by the commonly
used URI syntax, and sometimes those naming systems provide additional
semantic meaning for those systems. For those naming systems in which
parts of the URI syntax do not apply, it is clearly acceptable in my view
to ignore that part of the syntax. I hope this document convinces you,
however, that where the semantic meaning of name components are identical,
that mapping them into the a common URI syntax in fact has major medium
and long term benefits to the World Wide Web. For those who are working
on facilities which add new semantic meanings that might be shared between
schemes, I hope this document convinces you it is worth working on defining
what that common syntax should be.
If the same content cannot be served up under alternate schemes, or moved
to future schemes used in the Web, it will greatly inhibit introduction
of new schemes into the Web. If Web software cannot be written without
intimate intertwining of knowledge between components, and therefore updating
to introduce new schemes or content types, it will greatly inhibit
introduction of new schemes and software into the Web.
If URI syntax, therefore, is gratuitously different for the same semantic
meaning, it will strongly discourage future innovation in the World Wide
Web. The more random URI syntax is between schemes, the more Web evolution
will inhibited, the more programmers and protocol designers we'll keep
employed kludging around... (job security!). But since there is enough
work to go around in the Web, I believe it is clear that unity of URI
syntax for semantically equivalent constructions is essential for the
future health of the World Wide Web.
A single URI specification that covers general URI syntax, along with
guidance on how to design new URI schemes (and the consequences of different
design decision), probably as a separate new document, is preferable to
splitting the URI spec into several specifications (e.g. scheme, vs.
independent URL and URN specs). Each URI scheme should be able to reference
this single syntax and semantics specification, and it should be able
to do so and make clear which components of the generic URI syntax applies
for that scheme (and which components do not!). The November draft submitted
by Fielding is closest to this model, but does need some further work;
e.g. the host part of the document needs clear deliniation from the rest
of the URI spec, so that it is clear that this is additional syntax which
is common in a number of schemes, but not at all inherent in URI syntax.
Jim Gettys
Digital Equipment Corporation
Visiting Scientist, World Wide Web Consortium
Received on Monday, 9 February 1998 14:40:58 UTC