Proposal: HTML use of remote alt text; matching extension/formalization for HTTP Content-Disposition & MIME

# Recipient list

WHATWG: html, html-aam, html-aria
W3C WGs: html, apa, aria, webapps
IETF WGs: httpbis, 822ext

CC authors of (current) prior RFCs: Nathaniel Borenstein, Steve Dorner, Ned
Freed, Ed Levinson, Keith Moore, Julian Reschke, Rens Troost

Cross-posted by email to W3C & IETF groups, and by GitHub to WHATWG at:
* main: https://github.com/w3c/html-aam/issues/309
* crossposts by reference:
  - https://github.com/whatwg/html/issues/6061
  - https://github.com/w3c/html-aria/issues/248

The content is equivalent, modulo small formatting changes for Markdown vs
email, and addition of section deeplinks in Markdown version.



# Background

## Objective

Humans with disabilities, and machines, should have fully equal access to
the textual content of image and other files.


## Problems with the current specs

1. In current practice, the embedder of content often fails to add alt
tags, making it inaccessible to people with disabilities and to computers.
2. It is literally impossible for the embedder to describe some content,
e.g. dynamic images; in such situations, the current specs cannot fulfill
the goal of accessibility.
3. An image's embedders must describe its content, even though its source
is better able to do so, both practically and authoritatively.
4. Human effort is wasted by requiring many end users to write content
descriptions for a single source file.
5. Updates to the HTTP Content-Disposition header spec failed to include
Content-Description in the spec.
6. MIME/HTTP Content-Description is equivalent to HTML LONGDESC (narrative
description). There's no current field equivalent to ALT (verbatim content
in text form).



## Relevant prior RFCs

### HTTP/1.1 Content-Disposition header & Content-Description field

* RFC 2616 [obsolete] Hypertext Transfer Protocol — HTTP/1.1
  - https://tools.ietf.org/html/rfc2616
  - § 15.5 Content-Disposition Issues (security)
  - § 19.5.1 Content-Disposition
* RFC 7231 [current, no updates] Hypertext Transfer Protocol (HTTP/1.1):
Semantics and Content
  - https://tools.ietf.org/html/rfc7231
  - Appendix B Changes from RFC 2616
    "The Content-Disposition header field has been removed since it is now
defined by [RFC6266]."

* RFC 1806 [obsolete] Communicating Presentation Information in Internet
Messages: The Content-Disposition Header
  - https://tools.ietf.org/html/rfc1806
  - § 3 (Content-Description only in examples)
* RFC 2183 [current, no relevant updates] Communicating Presentation
Information in Internet Messages: The Content-Disposition Header Field
  - https://tools.ietf.org/html/rfc2183
  - § 2 The Content-Disposition Header Field
  - § 2.8 Future Extensions and Unrecognized Disposition Types
  - § 3 Examples (only section mentioning Content-Description)
* RFC 6266 [current, no updates] Use of the Content-Disposition Header
Field in the Hypertext Transfer Protocol (HTTP)
  - https://tools.ietf.org/html/rfc6266
  - Note: has no mention of Content-Description


### HTML

* RFC 1866 [obsolete] Hypertext Markup Language - 2.0
  - https://tools.ietf.org/html/rfc1866
  - § 5.10 Image: IMG (ALT tag)
* RFC 2854 [current, informational] The 'text/html' Media Type
  - https://tools.ietf.org/html/rfc2854
  - (standard transferred from IETF to W3C)

* HTML 4.01
  - https://www.w3.org/TR/html401/struct/objects.html
  - § 13 Objects, Images, and Applets
  - § 13.2 Including an image: the IMG element (longdesc URI)
  - § 13.8 How to specify alternate text (alt text)

* HTML 5
  -
https://html.spec.whatwg.org/multipage/embedded-content.html#the-img-element
  - https://html.spec.whatwg.org/multipage/images.html
  - https://html.spec.whatwg.org/multipage/input.html
  - https://html.spec.whatwg.org/multipage/rendering.html
  - § 4.8.3 The img element
  - § 4.8.4 Images
  - § 4.8.4.4 Requirements for providing text to act as an alternative for
images
  - § 4.10.5 The input element
  - § 4.10.5.1.19 Image Button state (type=image)
  - § 14 Rendering
  - § 14.4.2 Images

* HTML Accessibility API Mappings (AAM)
  - https://w3c.github.io/html-aam/
  - § img Element Accessible Name Computation
  - § input type="image" Accessible Name Computation


### MIME Content-Description header

* RFC 1341 [obsolete] MIME (Multipurpose Internet Mail Extensions)
  - https://tools.ietf.org/html/rfc1341
  - § 6.2 Optional Content-Description Header Field
* RFC 1521 [obsolete] MIME (Multipurpose Internet Mail Extensions) Part
One: Mechanisms for Specifying and Describing the Format of Internet
Message Bodies
  - https://tools.ietf.org/html/rfc1521
  - § 6.2 Optional Content-Description Header Field
* RFC 2045 [current, no relevant updates] Multipurpose Internet Mail
Extensions (MIME) Part One: Format of Internet Message Bodies
  - https://tools.ietf.org/html/rfc2045
  - § 8 Content-Description Header Field
  - § 9 Additional MIME Header Fields

* RFC 1872 [obsolete] The MIME Multipart/Related Content-type
  - https://tools.ietf.org/html/rfc1872
  - § 4 Examples (mentions Content-Description)
* RFC 2112 [obsolete] The MIME Multipart/Related Content-type
  - https://tools.ietf.org/html/rfc2112
  - § 4 Handling Content-Disposition Headers
  - § 5 Examples (mentions Content-Description)
  - § 5.3 Content-Disposition
* RFC 2387 [current] The MIME Multipart/Related Content-type
  - https://tools.ietf.org/html/rfc2387
  - § 4 Handling Content-Disposition Headers
  - § 5 Examples (mentions Content-Description)
  - § 5.3 Content-Disposition


### EXIF

* CIPA DC-008-2012 Exchangeable image file format for digital still
cameras: Exif Version 2.3
  - http://www.cipa.jp/std/documents/e/DC-008-2012_E.pdf
  - § 4.6.4 TIFF Rev. 6.0 Attribute Information (ImageDescription tag)



## Discussion

1. In current practice, the image's source server will often have all the
necessary metadata to describe the image.

It could put this metadata in its HTTP headers. However, this information
is either not transmitted, or not used.

Some image formats provide for the necessary metadata. However, these are
rarely used — typically, it's stored separately — and not all image formats
have this support.


2. Some images used in HTML are deliberately dynamic.

Consider the various dependency/test status images used on GitHub.

Only the remote image server knows, at display time, what the image
represents. This is because it runs test suites on the most recent version
of the codebase, checks the current status of servers, monitors third-party
published vulnerabilities or library updates, etc.

Example: https://github.com/atom/atom

The first 3 images in the README section are:
a. Azure Pipelines build/test/integration status
b. David Dependency Manager dependencies update status
c. Heroku/Slack server status [this image currently doesn't load]

The correct alt text for these, at time of writing, should be:
a. Azure Pipelines succeeded
b. dependencies up to date
c. Heroku is offline for maintenance

It's impossible for the author of README.md, or GitHub itself, to know any
of this before the user agent actually fetches the image.

As a result, people using a screen reader get zero information from these
images, whereas sighted users know the live statuses .

(Pedantic caveat: actually, GitHub runs a caching proxy server on such
images; they aren't fetched by the user agent directly from the
authoritative server. However, this is functionally transparent.)


3. Content-Description is not defined equivalently to ALT text.

Content-Description is defined as "some descriptive information" (RFC 2045
§ 8). All examples in the RFCs are either narrative, e.g. "just a small
picture of me" (RFC 2183 § 3), or useless, e.g. "jpeg-1" (id.).

By contrast, ALT text is meant to be the nearest equivalent — which, in the
case of simple images of short text, is the verbatim text.

HTML 4.01 (§ 13.8) describes it as "alternate text to serve as content when
the element cannot be rendered normally".

HTML 5 describes it as "equivalent content for those who cannot process
images or who have image loading disabled (i.e. it is the img element's
fallback content)" (§ 4.8.3). It "should never contain text that could be
considered the image's caption, title, or legend. It is supposed to contain
replacement text that could be used by users instead of the image; it is
not meant to supplement the image" (§ 4.8.4.4.1).

AFAICT, there is no equivalent field in either HTTP or MIME. There could
and should be.



# Proposals

## Mime — Content-Text

Update RFC 2045 to add the header Content-Text, defined as follows.

Content-Text should contain that text, following the specifications in
WHATWG HTML 5 § 4.8.4.4.

All files should include this header if:
1. the file is not a TXT/* MIME type, and
2. the file semantically (if not digitally)
  a. contains text, or
  b. has a text equivalent


## HTTP — Content-Disposition

Update RFC 2183 and RFC 6266 to change the Content-Disposition header as
follows:

1. formalization of Content-Description
   Re-add the Content-Description field, as defined in RFC 1521 § 6.2.
2. addition of Content-Text
   Add the field Content-Text, defined identically to the RFC 2045 update
above, by reference.


## HTML-AAM — IMG and INPUT type=image

Insert the following before the "none of the above" option in the HTML-AAM
accessible name computation instructions:

When an ALT or TITLE attribute is not available, use the first available of
the following:
1. equivalent metadata in the image file, e.g. the
Exif.Image.ImageDescription field
2. image's HTTP Content-Disposition header's Content-Text field
3. image's HTTP Content-Disposition header's Content-Description field


## HTML — no change

There is deliberately no change proposed to the HTML spec itself.

The purpose of this proposal is to address situations where the HTML author
does not, or cannot, add the relevant information. Therefore, the changes
are to user agent behavior, and to the data accessible to user agents from
sources other than the HTML, i.e. server and file headers.



# Intellectual property release

All original IP in this proposal is owned jointly by Sai and Fiat Fiendum.

We freely license it as follows:
1. Copyright: CC-by (attribution-only)
https://creativecommons.org/licenses/by/4.0/
2. Patentable material: public domain where possible, otherwise CC Public
Patent License
https://wiki.creativecommons.org/wiki/CC_Public_Patent_License

Sincerely,
Sai
President, Fiat Fiendum, Inc., a 501(c)(3)

PS Non-gendered pronouns please. I'm a US citizen.

Received on Thursday, 15 October 2020 16:16:23 UTC