Re: New Version Notification for draft-tbray-http-legally-restricted-status-00.txt from Nicolas Mailhot on 2012-06-14 (ietf-http-wg@w3.org from April to June 2012)

From: Nicolas Mailhot <nicolas.mailhot@laposte.net>
Date: Thu, 14 Jun 2012 14:45:47 +0200
To: "Mark Nottingham" <mnot@mnot.net>
Cc: "Nicolas Mailhot" <nicolas.mailhot@laposte.net>, ietf-http-wg@w3.org
Message-ID: <de7b47c0761aea95987a3f66f9c549bd.squirrel@arekh.dyndns.org>
Le Jeu 14 juin 2012 12:14, Mark Nottingham a écrit :
>
> On 14/06/2012, at 7:04 PM, Nicolas Mailhot wrote:
>
>> Mark Nottingham <mnot@...> writes:
>>>
>>> So, again -- what's the use case for a machine consuming these? I haven't
>>> seen
>> one yet, unless I've missed something.
>>
>> 1. the block cause must be reflected in headers to be available in (legal or
>> debugging) logs. With today's spaghetti web sites that shard their content
>> over
>> multiple servers and delegate more and more bits to nebulous cloud platforms
>> with fuzzy limits having to read all the html pages exchanged to debug a
>> block
>> is increasingly un-practical
>
> The device doing the blocking can log its own actions without making it
> machine-readable on the wire.

That's pretty much useless for the web app/web client developer the users will
complain at since he won't have access to the device logs and often won't even
know where to ask for them.

Also, as soon as you have a multi-tiered network architecture, or one
organisation making some web resources available to another through two
separately-managed networks, that's the road to log correlation and
multi-organisation coordination hell.

>> 2. the block cause must be reflected in headers so all the web clients that
>> use
>> http as transport via some other middleware can be notified of the block
>> cause
>> (ie the not-html-capable http middleware needs something it can easily parse
>> and
>> pass up the stack so the web client presentation layer gets a chance to do
>> something with it and inform the user)
>
> Yes; this is the most compelling case I can see for a separate status code,
> similar to the one we had for 511. However, I don't yet see where the client
> will actually *do* something with the response that might require making this
> distinction. It was necessary in 511 because sites were interpreting the
> redirect as coming from the origin server, which has all sorts of nasty side
> effects for non-browser agents; what are the corresponding side effects here?

To be honest I don't think the problem-space is any different from error 511.

In my opinion the current discussion just shows that error 511 didn't go far
enough to handle gracefully all real-world blocking scenarii and assumed
(wrongly) that a blocking error with human content that would not necessarily
be displayed to the user was sufficient, because no one cared why the blocking
occurred. When in fact the only reason Tim Bray launched this discussion was
because he wanted to convey the blocking rationale.

After spending quite a lot of time thinking about the problems error 511 does
not solve in our context (which admittedly is a lot more complex than the home
office case), and getting varied feedback from the developers of some of the
web clients I asked implementing error 511, and having to solve a few more
problems caused by current web client inability to interact gracefully with
blocking gateways, I'd really love to see a design such as the following:

%<-----

1. a generic 'blocked by intermediary' error code web clients are forbidden to
retry on (same rationale as error 511)

2. a header containing a free-form short sentence describing the blocking
rationale ('Blocked due to law 666', 'Upgrade your access', 'Too young for
this', whatever). If you want to get fancy this header should be localizable
by suffixing/prefixing it with an iso 639 code (we don't need this but I'm
pretty sure it will be a requirement in countries with multiple official
languages where it is forbidden to discriminate between them, or on
transnational networks; that makes it a candidate for UTF-8 encoding by
default BTW)

The only purpose of this header is to give dumb web clients something to
display or log in case of blocking, and to leave no room for javascript or
complex html those clients can not understand (though for logging you'll also
want the two next headers)

I don't think there is any point in trying to define a set of standard values
for this header as humans are very good at inventing new
political/social/organizational/religious reasons to block some accesses.

3. an optional header containing the location of the full explanation of the
blockage (can be a pdf with terms of service, a web site with lots of fancy
chrome, a video statement of the local leader, some cartoons to explain kids
why they don't have access to adult content, the local three-strikes law
excerpt, an helpdesk wiki page, whatever)

IMHO it is a huge mistake to try to put the pretty explanation in the error
page itself as networking equipments are not all able to host the rich content
needed to explain a situation to many users, and current computers access
stuff right and left so any user on a network subject to blocking is likely to
receive the error multiple times in a row. Better to offload it somewhere the
user will only consult once or twice in a full browser, and that is able to
withstand the load if the blocking is too extensive. The gateway always has
the option to host this page itself and transmit its own IP as location

4. an optional header containing the location of a web portal the user can use
to elevate his access (when it is possible). Can be a full auth portal, a
payment portal, a form where a visitor can enter the token printed on the card
given to him by the hotel personnel, a web page with a button stating 'I agree
that pirating hollywood films is the ultimate sin', whatever

Some organisations will set a cookie (meaning only the web client used to
access the elevating access portal will work), others will authorize all
accesses from the ip that was used to elevate the access (for a time), there
are many possible variations and this scheme is not restrictive at all.

This way any web client can display at least 1. without requiring an html
engine, it is possible to provide users a more complex rationale
representation, and the web client is informed on how to elevate its access
when it is possible.

And it also works fine when you stack multiple levels of blocking gateways
that obey to different rationales and have different ways to trigger
un-blocking

Finally, that permits the writing of password managers and other browser
chrome extensions that auto-supply credentials when they recognize a trusted
portal location in 4. (and 4. also permits the use of https auth portals the
password manager can check the certificate of before injecting credentials).

5. To be complete you also need to define a status code that the portal in 4.
will emit when it is satisfied by the token presented by the user, so browsers
can implement the obvious logic of:
“after receiving a blocking code on any access, if the portal location was
loaded and returned the ok code, retry the access that was previously
blocked̆”

%<-----

The main side effect is the same as for error 511: web client writers will
have to acknowledge intermediaries exist, and not posit that errors are
necessarily emitted from the endpoint. And they'll have to sandbox the error
appropriately so there is no leakage between the endpoint dialog and the
intermediary dialog, especially if it occurs during https browsing.

Like with error 511 that's not a reality web client writers will like and like
with error 511 the intermediary always had the option of blocking the traffic
with no explanation anyway. So there's no point in making it hard for it to
supply information that is likely to help the user.

>> 3. the info must absolutely be relayed to the human user so he gets a chance
>> to
>> do something about it.
>
> That really sounds like HTML; most browsers don't expose the raw status code.

That's why I suggest dual reporting: short rationale in http header, full
rational at a location specified in another header.

>> 4. The info must be complete enough the human user can act on it. Going to
>> the
>> hotel lobby to buy some more creds is not the same thing as being invited to
>> the
>> local police station to explain oneself
>
> Same as #3.

ditto

>> 5. There must be a way to inform the user ways to lift the block when the
>> possibility exists instead of having him hang in the dark (hotel payment
>> gateway
>> 'credit-exhausted pay-here-to-continue', corporate authentication portal
>> 'are-you-an-employee fill-in-your-corp-login-there',
>> 'content-was-blocked-because-we-believe-it-is-objectionnable,
>> here-is-the-html-form-to-contest-the-evaluation-if-you-believe-it-is-wrong',
>> etc, etc)
>
> How is that accomplished by a status code, and NOT by HTML?

It is possible if you complete a generic status code with the kinds of headers
I described before.

>> None of those work of you rely on an html soup error page that may or may
>> not be
>> parsed by the web client, or that you let the web client ignore at will even
>> when it is capable of parsing (current browsers refuse to display such pages
>> when a block occurs over https)
>
> None of these? Really?

None well enough for users to be happy and helpdesk be let alone. The current
situation is painful for everyone, there are too many ways blocking can occur
without the user been given the means to understand the blocking, nor support
given the means to diagnose easily what's wrong.

In fact, it is so painful right now that if being painful was sufficient to
stop humans from implementing blocking, no one would do it today.

Thank you for your attention!

-- 
Nicolas Mailhot
Received on Thursday, 14 June 2012 12:46:30 UTC