Re: Rewrite of feature tag syntax rules from Koen Holtman on 1997-05-19 (ietf-http-wg@w3.org from April to June 1997)

From: Koen Holtman <koen@win.tue.nl>
Date: Mon, 19 May 1997 16:42:13 +0200 (MET DST)
To: Larry Masinter <masinter@parc.xerox.com>
Cc: koen@win.tue.nl, http-wg@cuckoo.hpl.hp.com
Message-Id: <199705191442.QAA17230@wsooti08.win.tue.nl>
Larry Masinter:
>
># You need to spell out why decoding is worse than the other
># alternatives.
>
>Koen, spelling this out is painful. It's part of the fundamentals
>of how network protocols are designed and implemented. I
>don't claim that network protocol design is my specialization,
>but rather that the reasons why decoding is worse than
>other alternatives is so commonplace that it *should* go
>without saying.
>---
>It is common in many network protocols to have a value which
>is taken from an enumerated set of alternatives. There
>are a fixed set of choices, the sender designates a choice,
>and the recipient recognizes the choice as one established
>by the protocol. It is also common to have the enumerated
>set be extensible, either by revision of the protocol,
>a registration authority for new values, or a distributed
>registration method.
>
>In classic network protocols, enumerated values are often
>represented by incrementing bit patterns (e.g., "0" means
>'turn device on' and "1" means 'turn device off'), or hierarchical
>ones (such as ISO object identifiers). However, in many Internet
>application protocols, enumerated values are written out as a 
>sequence of ASCII characters, in order to simplify debugging
>(watching packet traces) and programming, e.g.,
>printf("GET %s HTTP/1.1\n", url).

Yes.  But feature tags are not ASCII strings just because it is easier
for `debugging and programming'.  The authors of negotiable content
will have to type in feature tags when making variant lists, and these
authors will generally not be programmers.  Even worse, end users will
in some cases look at feature tags in variant lists when deciding
which variant to get (though one would hope that variant list authors
use {description xx} a lot).

So feature tags have a very wide human interface.  People need to
remember them.  One would expect people to scribble feature tags on the
back of an envelope.

Starts to sound familiar?

Whether we like it or not, because of this wide human interface,
feature tags are a lot like URIs.  They are much more like URIs than
they are like packet type numbers in some binary protocol, or like
method names in HTTP.

We want something that is as easy as possible to use, and pass around,
by humans.  Compared to this, efficiency of comparison and passing
around by computers is not that big an issue.

>Feature tags and PEP extension tags are instances of the
>general class of "extensible set of enumerated values". The
>idea that we might use URI space as a way of generating
>new elements of the extensible set of enumerated values
>in order to distribute the name space assignment is cute,

I agree it is cute, maybe too cute, but the W3C seems to be firmly set
on this course (not just in PEP, it also pops up in some Cougar stuff,
for example).  I feel we will need to have _very_ good reasons to go
out of sync with the W3C's approach.

>but it doesn't change the fundamental nature of the tags
>as enumerated values and not general strings.

URIs are fundamentally enumerated values too.  I believe that in Tim's
original design, end users (including authors) were not even supposed
to see them.  Look what we have now.  We need to be careful to lean
from what happened with URIs.

This brings me to a point about limiting the syntax of feature tags
which was brought up by Martin J. Duerst:

|[...] but it can very well
|be expected that a *limitation* of values will create flamage.
| 
|This probably won't happen soon, because it usually takes some
|time for people to use new technology in localized contexts.
|Then after they use it localized, it takes some more time
|for people to realize that the various localizing solutions
|don't really fit together well. Thinking ahead will pay off!

So the question is: if we define feature tags as simple 7-bit tokens,
will we be thinking ahead enough?  Will this cause people to have flame
wars about the correct way to localise/internationalize feature tags
in future?

Will the lack of a standard escaping mechanism cause people to invent
multiple incompatible escaping mechanisms when mapping feature
negotiation onto other protocols?

My tentative answers to these questions are: no for the first one, and
no for the second one, if we keep the set of legal tag characters
small enough.  Still, by standardising on % escapes and UTF-8, we
could eliminate these risks entirely.

[...]
>In the
>case of feature tags, the feature tag itself is
>chosen from an enumerated set, but the associated value
>may (for some feature tags) indeed be text, and indeed
>require some amount of text normalization. This is
>similar to email headers, where the values of "To:"
>and "From:" might contain textual representations.

Yes.

Reading this thread again, here is what I have come up with as a
synthesis:

- feature tags are strings of legal tag characters, legal characters are
  letters, numbers, and a small number of other things like . / : - _ .

- tag comparison is case-insensitive.  There is no escape mechanism.

- all tags without : in them are in the IANA-maintained feature tag
  namespace

- prefixes in this namespace are registered/allocated by IANA

- all tags with : in them SHOULD be valid representations of 
  members of the URI namespace, and MAY be dereferenced by a browser
  capaple of doing so.  Creation of these tags must follow the rules
  (if any) of the corresponding scheme.

- tag values are octet sequences with % HEX HEX encoding.  Comparison is
  case-sensitive octet-by-octet after decoding.

- [we may also do this, more input requested:] the ftag: scheme
  mirrors the iana-managed namespace of tags without : in them.
  before comparing feature tags, any ftag: prefix must be deleted.


>Larry

Koen.
Received on Monday, 19 May 1997 07:45:43 UTC