Change Proposal: Gloss standard terminology for representation

Summary

The HTML 5 draft uses terms in ways that differ from previously ratified specifications:

I propose that it similarly explain its use of "resource" with respect to "representation" as defined in previously ratified specifications.

Proposal Details

In section 2.1 Terminology, somewhere before "A resource's critical subresources ..." I propose to add something like:

The term resource is used to refer to what is sometimes called a representation in protocol literature, such as section 1.2.2 of [RFC3986]. Where that specification speaks of a URI that identifies a resource whose state is communicated via a typed byte sequence called a representation, we simply say that a URL identifies a resource, which is a typed byte sequence; the indirection is not mentioned in this specification.

Rationale

Consistency with URI and HTTP specifications

While a simplified view of the web where URLs identify resources which consist mainly of byte sequences may suffice for casual readers of the HTML 5 specification, more diligent readers will need to understand the normatively cited specifications, especially the URI and HTTP specifications. RFC3986 was ratified in January 2005 as an Internet Standard for URIs, so there is a large audience that is familiar with that document, and it has substantial overlap with the HTML 5 specification readership. In section 1.2.2. "Separating Identification from Interaction," it says:

When URIs are used within information retrieval systems to identify sources of information, the most common form of URI dereference is "retrieval": making use of a URI in order to retrieve a representation of its associated resource. A "representation" is a sequence of octets, along with representation metadata describing those octets, that constitutes a record of the state of the resource at the time when the representation is generated.

Consider a reader familiar with that model of URIs, resources, and representations who comes upon section 5.6.2 Application caches of the HTML 5 specification:

An application cache is a set of cached resources consisting of:

This reader would expect to see "is a set of cached representations ... ." Some explanation of the difference in terminology is in order.

Consistency with Localized User Interfaces

A 28 Sep 2009 message from Leif Halvard Silli reports user interface localization experience involving separation of the concept of what a URI identifies (resources) from pieces of content that represent them (representations). It's not clear whether a gloss alone goes far enough to make the HTML 5 specification easier to understand for the users of those user interfaces; perhaps only a wholesale terminology update would suffice. But a gloss should provide clarification that could be elaborated in other tutorial material.

Impact

This is an editorial change; it has no language design impact; no conformance classes will have to change.

While readers who come to the HTML 5 specification with separation notions of resource and representation will still have to learn the HTML 5 terminology, there will at least be a part of the spec that explains the relationship that can be quoted when questions arise or in tutorial material.

One risk of adopting this proposal is that may reduce motivation for a wholesale adoption of the previously standardized terminology. We have little data to show which terminology is most likely to produce common understanding that leads to interoperability, and if the previously standardized terminology is actually better, this proposal may lead to a suboptimal result.

References

Dan Connolly