- From: Brian Behlendorf <brian@organic.com>
- Date: Sun, 7 Jan 1996 23:45:23 -0800 (PST)
- To: www-talk@w3.org
In the content negotiation subgroup we are attacking the problem of analysing
the con-neg proposals in HTTP/1.1. We all can probably agree that it's 90%
of the way there with regards to accomplishing its mission, which is
negotiation at the granularity of the content-type. We're still undecided
about how far that granularity goes - what do parameters on Accept:
content-types mean, for example.
However, there still looms the issue of HTML variants, "towards graceful
deployment of new features". We haven't talked that much about it simply
because we simply don't have any proposals to give feedback on. We
agreed at the phone conversation that it was an "interesting" topic.
However, the solution set for the problem also crosses heavily into the
HTML arena, so it seemed like bringing this solution set to a larger
audience would be a Good Thing.
Here are the goals I've seen stated or implied:
1) The solution must be a reasonable replacement for user-agent
negotiation, for a majority of the cases. Specifically not required to
be handled are bugs (i.e., the AOL browser's forms implementation is
broken)
2) The solution must require a minimal amount of work for browser
authors, a less-than-moderate amount of work for server authors
(given that there are many more browser authors than server authors :)
and allow for a range of efforts on the document-author side - so that
the base case is easy to handle, yet document authors can selectively
apply effort to making their documents more "portable".
3) The solution must address caching, with the goal of reducing the
amount of bandwidth and the number of cached items per resource.
There are basically three routes to go, as I see it:
I. The client indicates to the server a list of HTML extensions it
understands. This vocabulary of extensions is registered by some
impartial body, say IANA, in the same way SMTP EHLO extensions are. The
server is responsible for delivering content that the browser can
understand. The response must declare which extensions the page
depended upon, so caches can know when a response can be served locally.
II. Introduce conditional constructs to HTML. Basically create a new
content-type, text/cond-html, say, and have it use either marked sections
or PI's to implement a IF(feature|NOT feature), THEN (block) ELSE
(block). The "feature" would again be a registered keyword, which
browsers would be responsible for setting appropriately. Browsers which
supported cond-html would indicate so in their accept headers of course,
so there's still a big role for content negotiation.
III. Recommend that all (or as many as possible) non-backwards-compatible
extensions (like say maths) are implemented not as new HTML extensions, but
as new unique data types which are INSERTed into documents, allowing for
content-type-granularity content negotiation to work. For extensions
which are essential to HTML, use the "level" or "version" parameter to
distinguish, with the idea that this is an infrequent occurance.
Analysis of each:
I - Positives:
Browsers don't have to deal with unknown constructs
A "smarter" version of user-agent negotiation, since caches have
a chance of being able to cache something correctly.
Provides an easy way for browser authors to introduce new
features without being labelled as a destabilizing force, and
for "second to market" browsers to be able to implement new
features without having to lie about their user-agent.
Negatives:
Browsers *could* be incorrect - i.e., something which said it
could handle "tables" might not be able to handle
tables-within-tables, or inlined objects in tables (i.e. XMosaic 2.7)
Document authors need to be able to express (easily) which feature
sets the document uses. Either that, or servers need to parse
documents as they are served, staying aware of what tags map to
which IANA-registered feature sets.
Header bloat is an issue - the feature set vocabulary could be huge,
already a problem with Accept: headers.
II - Positives:
Reduces the processing requirements on servers - servers can
be "dumb", and thus more scalable.
Proxy caches don't have to worry about feature-set negotiation
either - one document, not 2^(# feature-sets) possible documents.
Relatively easy to implement in browsers. (conjecture)
If a browser thinks it can handle a particular feature, but ends up
giving an error, it can default to the counterpart easily.
Negatives:
File bloat - conceivably documents could be 2-3 times as large as
normal, since the client will be throwing away what it doesn't
understand.
III - Positives:
Allows regular content-type negotiation to handle varying
capabilities in user-agents.
Part psychological - gets users and developers to stop looking at
HTML as a "kitchen sink"
No special requirements on caches.
Bandwidth is minimized.
Negatives:
Requires implementation in browsers of EMBED/INSERT in browsers, which
is not trivial.
Compound document authoring tools not pervasive - instead of one
file, we will have many files combined into one, possibly making
management more difficult for the average user.
We've had a fair amount of discussion on www-talk and the various
IETF lists about each of these three. What benefits/drawbacks did I miss?
Obviously since none of these are implemented widescale some of the
benefits and drawbacks may be speculative, and I'd rather argue from
experience than speculation...
One problem, of course, is that we're essentially trying to decide where
a problem gets solved - at the document level or at the transmission
level. Thus it may seem odd for an HTTP subgroup to say "solve this in
HTML", or an HTML subgroup to say "solve this in HTTP". We need
leadership from members of both communities, who understand the strengths
and limitations of both systems, to decide where this problem gets
solved.
My personal analysis is this: #2 represents the best choice because it
makes available to the user-agent *all* the information it needs to
handle the rendering, giving it the power to make decisions as to which
features to parse and which to not. This is even a solution to the
problem of buggy browsers: XMosaic 2.7 right now can handle text in
tables (barely), but not hyperlinks or inlined images. It could have the
TABLES variable set to INCLUDE, and if it ran into a construct it
couldn't handle, it could go back and turn it off, handling the
non-tables version of the negotiated module. The most common way to
implement #1 will probably be through the use of server-parsed
conditional HTML anyways, so doing conditional HTML in #2 does not
represent more difficulty for the document author. #2 has already been
implemented piecemeal - notice the NOEMBED tag in EMBED. There are also
legal reasons why giving all the information to the clients is a Good
Thing. #2 is also much friendlier to caches, as it requires no changes
and keeps the number of possible variants per resource low.
We can still recommend that #3 be persued, but requiring it in the absence of
#2 has sounded politically impossible. I'm hopeful about the progress made
on stylesheets and the INSERT proposal to enable that, but I don't want to
chill enhancements to HTML too. Setting up the infrastructure to support #1
may be too costly to simply say "sure, let's do that too". #1 and #2
aren't strictly exclusive but politically/psychologically they might be.
I hope this post ties together some previously disparate threads, and
lets us compare. Which do you prefer? Which do you find more elegant?
I'd specifically like to hear from browser authors and people who author
large collections of documents.
Brian
--=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=--
brian@organic.com brian@hyperreal.com http://www.[hyperreal,organic].com/
Received on Monday, 8 January 1996 02:42:41 UTC