MIME, Extensibility, Registries

Larry Masinter, 12/10/2011, still early draft for discussion

This document is circulated as part of TAG

Please discuss this document on www-tag@w3.org (archived).

Introduction

This document explores three interrelated topics, and proposes some potential TAG findings for them. It is still very sketchy.

Extensibility:
Technical specifications for languages, formats and protocols make use of identifiers -- names chosen from a set of values. "Extensibility" lays out a framework for methods of extensibility.
Registries:
One important method of extensibility is the use of a registry. The "Registries" section lays out a framework and best practices for the establishment, use and operation of registries.
MIME:
Two important identifier value spaces in the web architecture comes from MIME (the Multipurpose Internet Mail Exchange set of specifications): the "Internet Media Type" registry and the "charset" registry. The "MIME" section lays out some issues and considerations for the use of MIME in the web.

Terminology

language, format, protocol, protocol element, language term


Extensibility

Introduction (Extensibility)

Technical specifications for languages, formats and protocols make use of identifiers--names chosen from a set of values. In many cases, there are parameters or values which are allowed as extensibility points, where the interpretation of the value cannot be directly determined by the specification and the value itself; instead the meaning is to be discovered by some other process.

The web architecture relies on many extensibility points; for example, content-types, uri schemes, color names, host names, html attributes to a given element, country codes, HTTP headers, css rounded corners.[refs].

Technical specifications intended as long-lived standards often provide extensibility points which allow new identifiers and values to be used, such that even after the language, protocol, or format has been defined and deployed, new values may be assigned, without updating the protocol standard or specification, or creating a new version of the specification and the concurrent cost of protocol/format version management.

There are a variety of ways of managing extensibility of sets of enumerated values, and establishing a mechanism for introducing private and/or public extensions.

Considerations

Things to consider in any extensibility mechanism:

Update:
Allow extensibility without revising specification (careful negotiation over what extensibility points are allowed).
Matching reality:
Extension points can be registered without commitment to implementation, giving implementors little practical guidance
Discovery:
How may implementors discover which extensions are meaningful, important?
Timeliness:
There is a tension between establishing a long-lived extensibility point meaning, between cementing the value too soon (before consensus is reached) and too late (after widespread deployment), especially when those overlap (extensions are widely deployed before consensus is reached)
 
Transition:
How can extensibility points be managed, updated, obsoleted? Can registered values be "poached"?
Lifetime:
Is the documentation and value of the registry entries as long-lived as the base document? How does this impact?
Fairness:
The process for extending a standard needs to have similar characteristics as the standard itself, in terms of "fair" and "transparent".

Findings

Finding.defineExtensibility:
Extensibility must be planned and implemented; specifications that allow enumerated values and define their meanings should also meaningfully determine the behavior of complaint implementations when confronted with unrecognized values; for example, to distinguish between "must understand" and "must ignore" for unrecognized values. In the design of protocols with extensibility points, the guidelines for dealing with "unrecognized values" are essential for controlling extensibility. It *is* useful to provide some way of communicating with agents which do not understand extensibility names by giving explicit rules for how unrecognized names are to be dealt with (ignored, warned, looked up, etc.)
 
 
 
Finding.prefer-URIs:
The simplest and most effective way of providing extensibility is to use the space of URIs as the way of naming individual extensibility points, and the "meaning" of the URI as a way of discovering what the value means.

Using URIs to name extensibility

httpRange14 update/matching reality/discovery/transition/lifetime/availability. Preferred method, modulo longevity of URIs. Note that URN allows naming a registry as a URI.

Discuss each of the considerations.


Using Vendor Prefixes

(example from CSS, analysis of transition path difficulties) Do we make any recommendations over URIs vs. vendor prefixes?

Discuss each of the considerations.


Registries

Introduction

A registry consists of the documentation for a set of registered values and their meaning, where the registry maintained by an organization (the registrar) with a commitment to maintain the registry and make it publically available. To ensure that quantities have consistent values and interpretations across all implementations, their assignment must be administered by an "authority": an organization or consortium which manages the values and insures proper administration.

For example, The Internet Assigned Numbers Authority (IANA)[ref] is the primary organization whose charter and purpose is to maintain registries of values needed for Internet protocols and languages as defined by the IETF.[ref BCP from which this was quoted] IANA administers the registry of many parameters in the core of the Web architecture: the space of URI (and IRI) scheme names, the space of media type identifiers ("MIME types"), a registry of HTTP protocol header values, HTTP result codes, names of character sets and character encoding schemes (charsets) and so forth. The architecture of the world wide web relies on extension points using "registration", even in W3C-specified protocols, languages, and formats which are not reviewed or published within the IETF.

Update:
A registry has a specific update policy.
Matching reality:
Registries tend to go out of step with reality unless costs of registration or registry update are low and benefits are high to at least one of the parties authorized to make a registration or update. (See "Divergence from Reality" below).
Discovery:
Manual discovery is hindered by many alternative places to find a registry, and the possibility of alternative locations (Wikipedia for MIME types, for example.)
Timeliness:
In particular for registries, there is a tension between establishing a long-lived extensibility point meaning, between cementing the value too soon (before consensus is reached) and too late (after widespread deployment), especially when those overlap (extensions are widely deployed before consensus is reached). The policy needs to move toward "registration before deployment" independent of where in the standards cycle that holds. If the standard differs from early deployment, the registry should be updated to point to not only the standard but also the facts for what one might encounter "in the wild".
 
Transition:
If registries encode status in registered names (as the MIME registry does), transition and grandfathering are issues.
Lifetime:
The documents pointed to by IANA registry are not as long-lived as the registry itself, and much of the information is obsolete. See "Registry Stability" below.
Fairness:
IANA is well capable of providing fairness and review in the case of disputes. Wikis and other methods for maintaining a registry have more (or at least different) potential for abuse.
 

Finding.use-IANA:

W3C specifications SHOULD use IANA registration methods for those extensibility points which are shared with other (IETF-managed) application protocols, rather than inventing their own registries.
Finding.explicit:
Any extensibility points in a W3C specification MUST be explicit about the method and management of the registration of new values in a public, fair, and transparent way.
 

The "best current practice" specification in [BCP 26][RFC 5226] gives guidelines to protocol designers for establishing the registry rules associated with an IANA registry. Note that IANA acts as the operator of each registry, but itself does not evalute registry requests, but merely adminmisters a process by which the organization or individuals authorized to review or approve registry entries are accepted. These guidelines apply to IANA namespaces established or requested by W3C working groups or task forces.

Divergence from Reality

Sniffing, "Willful Violations", Incomplete Inaccurate Registries

In some cases, community practice has evolved and the registries have not followed: the registries have not tracked the use of extensibility parameters, or where extensibility values are often ignored. In some cases, the registry is percieved as a bottleneck.If there is a registry, it is only useful if values are registered. A registry which does not match actual use (as is currently the case with URI schemes, Media Types) is not very useful.

Sniffing: the registry has not tracked, or the right extensibility parameter is not used. [ref mime-sniff]

"willfull violations": the registry values have been misused, and the technical specification contains new values that do not agree with the registry.

Finding.newRegistry:
Technical specifications that wish to override an existing registry for some values and use it for another should (a) attempt to correct the extensiting registry; in cases where it cannot, the group should (b) establish a new "override" registry with new values, where the spec points to the new registry.
Finding.evolveTowardSimplicity:
The specifications and standards process should be managed so that over the long term sniffing is minmimized and deployment of further misconfigured values is discouraged.

Registry stability

Often, a registry does not contain the actual definition of the meaning of a term or value, but rather contains a pointer to a document or document series which defines that value. For example, the Internet Media Type registry defining file formats and languages often contains a pointer to the document or specification. However, specifications themselves update. And sometimes they "fork" -- there can be multiple competing definitions. (In some cases, "forking" is "poaching").

Requiring the documentation to be stable is another reason why registrations diverge from reality.

Finding.Series:
Registries should allow updates, and note warnings. In particular, documents rarely change without making a change which is incompatible in at least one direction (old content is invalid under the new definition, vs. new content is invalid or not processed interoperably in the old value.).
Finding.Forking:
If specifications are "forked" in incompatible ways, then use separate names for the forks.

Status of Registry Entry in Registry Name

Registry values typically go through a life-cycle, where a parameter is introduced experimentally, deployed in a limited or vendor-specific context, and then adopted more broadly.

Frequently, groups with registries or registered values attempt to convey status of a registered value in the name chosen within the registry, e.g., using an "x-" prefix for experimental names, "vnd." prefixes in internet medai types, etc. In practice, these conventions are failures, counter-productive, because there is no simple deployment path when status changes, e.g., vendor proposed extension become public standards, experiments succeed, etc.

Finding.noStatusInName:
Do NOT attempt to encode parameter status in the name; do not use "vnd.", or "x-".
Finding.registrationEase:
There is a tradeoffs between requiring registry entries contain complete information and getting more things registered. In general, the cost of using unregistered values must be non-negligible to the organizations allowed or encouraged to register a value, if a distributed development community is to use the registry.

Organizational support:

W3C staff & working group participants must manage the registration information, and that the process itself needs revisions. Other registrations have their own administrative procedure. A regular "have obligations related to registration been met" check into the W3C document publication/advancement procedure.


MIME and the Web

In particular, there are two IANA registries essential to the web, Internet Media Types and Charsets.

Both have "willful violations" and "sniffing" in the HTML5 specification.

Fragment identifiers are defined in web architecture but not required enough in [MediaRegUpdate].


References

[BCP26]: Guidelines for Writing an IANA Considerations Section in RFCs, BCP 26, RFC ...

[IABext] Design Considerations for Protocol Extensions work in progress, Internet Draft

[Friendly] Friendly Registries, work in progress, Wiki Page, requirements and a place to gather explicit proposals

[HappyIana] https://www.ietf.org/mailman/listinfo/happiana

[LinkRelation] http://lists.w3.org/Archives/Public/www-tag/2011May/0006.html

[sniff] http://tools.ietf.org/html/draft-ietf-websec-mime-sniff

[MediaTypeFinding] Internet Media Type registration, consistency of use TAG Finding 3 June 2002 (Revised 4 September 2002)

[MIMEGuidelines] Register an Internet Media Type for a W3C Spec (W3C guidelines on registering types)

[MediaRegUpdate] Media Type Specifications and Registration Procedures, Intenet Draft, work in progress

[NoX] X- parameters harmful (Peter St. Andre)

[SpecUpdate] Best Practice for Referring to Specifications Which May Update [email draft, H. Thompson, C.M. Sperberg-McQueen]

[VendorFlap]


Left Over bits

Here are some notes from discussion not yet incorporated:

Reasons for a "registry":

  1. to avoid conflict (main purpose for all of the methods)
  2. to set a bar and set review - you want to have a quality of anything introduced
  3. to provide look-up
  4. limit the number because there is a cost of introducing each one

For example, some protocol designers thought a new URI scheme could cause a lot of extra work. For HTML tags, when you introduce a new section, everyone needs to understand that who implements browsers.

But if you add metadata, it's no skin of anyone's nose. so you have 2 situations - one on which you need whole community to get involved and one in which anyone besides a sub-community can ignore.

Only tangentially related to registry-based solutions, Mark Nottingham quotes ([12]http://lists.w3.org/Archives/Public/www-tag/2011Dec/0049.html) Roy Fielding as calling mustUnderstand-based approaches "socially reprehensible" we need a decision tree - questions to answer to understand what kind of extension you're doing and which of these techniques you should use

Compound extensibility points: when a new version of an exensibility point defines a new context in which old extensibility points are interpreted. (This is "willful violation" territory, if not also "sniffing" territory).

see discussion following http://lists.w3.org/Archives/Public/www-archive/2011Nov/0009.html