W3C home > Mailing lists > Public > ietf-http-wg@w3.org > January to March 2013

"In the wild" use case for Accept-Charset with parameters; musings on a modern alternative

From: Nicholas Shanks <nickshanks@nickshanks.com>
Date: Tue, 12 Mar 2013 10:52:13 +0000
Message-ID: <CA+hEJVX8DyA5REhOrsLmQ+XHz=3+=chGRfpZvqJKfcWZ82aOGw@mail.gmail.com>
To: IETF HTTP Working Group <ietf-http-wg@w3.org>
I just came across this page:
http://www.utexas.edu/cola/centers/lrc/ielex/

The first bulleted UL on the page demonstrates that real-world needs
for Accept-Charset are not met by existing specifications for this
header, or other related TCN headers/UA behaviour.
I am aware that Google are presently applying a patch to remove
explicit support for Accept-Charset from Chromium. They are the last
of the major browser vendors to do so.
This has led me to ponder what could be done in a post-Accept-Charset
world to automate variant selection for the above use case (negotiate
between representations based upon installed fonts, rather than UA
support for charsets).

I think a new the following would make sense (n.b. I just made up the
unicode-range values for demonstration purposes)

-> GET //www.utexas.edu/cola/centers/lrc/ielex/PokornyMaster-X.html

<- 200 OK
Content-Type: text/html; charset=utf-8; unicode-range="U+40-7F,
U+2000-207F, U+10000-103FF"
Alternates: {"PokornyMaster.html" 0.8 {unicode-range "U+40-7F,
U+2000-21FF"}}, {"PokornyMaster-R.html" 0.4 {charset iso-8859-1}}

(UA determines [via Unicode-Range header or while parsing response
body] that it would use a last resort font or .notdef/.null glyphs to
display some characters, so issues a second request for the variant
with the highest qs value that the UA knows it can support)

-> GET //www.utexas.edu/cola/centers/lrc/ielex/PokornyMaster.html

<- 200 OK
Content-Type: text/html; charset=utf-8; unicode-range="U+40-7F, U+2000-21FF"
Alternates: {"PokornyMaster-X.html" 1.25 {unicode-range "U+40-7F,
U+2000-207F, U+10000-103FF"}}, {"PokornyMaster-R.html" 0.5 {charset
iso-8859-1}}

This introduces a new, optional Content-Type parameter,
"unicode-range" valid for text/* types.
Also, it adds one TCN extension, per RFC2295 section 5.1 syntax:
extension-name = "unicode-range"
extension-value = quoted-string


This way, we get all the usual benefits from Alternates-based negotiation:
• Only negotiable resources being viewed on sub-par devices are
subject to a second round-trip
• There is no Accept-* overhead on the initial request
• Each representation has it's own URI and does not use Vary, so
caching is optimal. Alternates header is always sent, to support
stateless proxies, or in case requests go via different routes.
• Variant selection is done by the UA, no leaking of
configuration/user-identifying info

Downsides:
• First response may be downloaded unnecessarily. Authors should link
to/serve the highest source-quality representation available by
default. UAs on insufficient devices may need to use heuristics based
on data flow rate and Content-Length to choose whether to close the
connection and open a new one, or wait for entire body of response to
download and re-use the same connection for the subsequent request.
• The onus falls on UAs to be smarter about automatic variant
selection. They could even display a dialog if automatic selection is
not desired, e.g. "This document cannot be displayed correctly due to
lack of fonts supporting characters used. A lower-quality, but
supported alternative is available. Do you wish to continue using this
document or request the alternative?"

-- 
Nicholas.
Received on Tuesday, 12 March 2013 10:53:26 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 12 March 2013 10:53:39 GMT