Re: Httpdir early review of draft-ietf-alto-new-transport-07 from Lachlan Keller on 2023-04-10 (ietf-http-wg@w3.org from April to June 2023)

From: Lachlan Keller <lachlan.keller@yale.edu>
Date: Mon, 10 Apr 2023 15:37:48 -0400
To: Martin Thomson <mt@lowentropy.net>
Cc: ietf-http-wg@w3.org, alto@ietf.org, draft-ietf-alto-new-transport.all@ietf.org
Message-ID: <CAOj3RjafL5SFjY+foJHBXptMY+JqPjg2SsVpabzLqZo9NPt28Q@mail.gmail.com>
Hi Martin,

Thank you again so much for the comments! Echoing Kai, apologies for the
delayed response, but in the meantime, we've hopefully addressed your
concerns. Based on your comments and the other reviews, the opening of the
document has been restructured and edited, with the updated editor's
version in the github repo:
https://github.com/ietf-wg-alto/draft-ietf-alto-new-transport. The README
contains links to compare the current editor's copy to -07 which you
reviewed, which might be more helpful than some of the larger inline
comments which can be found below.

Thank you again for the time you put into this and helping us make this
document the best it can be. Looking forward to your response.

Best,
Lachlan

On Wed, Mar 22, 2023 at 10:24 PM Martin Thomson via Datatracker <
noreply@ietf.org> wrote:

> Reviewer: Martin Thomson
> Review result: Not Ready
>
> I'm going to level-set here from the outset.  I have not given this
> document as
> thorough a review as it might need to be sure, but only because I was
> unable to
> understand it in the hour or so that I spent.  That's clearly not enough
> time
> for something this complex, so adjust the finding of "Not Ready"
> accordingly.
>
> # Introduction
>
> This document aims to describe an ALTO-specific means of providing clients
> with
> updates to the transport and network information.  It expands on previous
> efforts that use SSE, which are generic, but maybe don't take advantage of
> the
> newer HTTP server push feature.
>
> ## This is ALTO
>
> I like that this doesn't shy away from making its design very specific to
> the
> application.  Lots of people get grandiose ideas that their design is
> wonderful
> and generally applicable and try to build something very generic, in the
> process
> losing touch with both the needs of their application.  They might convince
> (maybe deluding) themselves that others will take on their wonderful ideas
> and
> apply them to completely different applications.
>
> This document suffers from no such delusions, which is great because this
> work
> is really very difficult to understand for an outsider.  I was involved
> peripherally with the initial ALTO HTTP design, so have some familiarity
> with
> its goals and structure, but I found this document very difficult to
> process.
> Maybe more time would help, but I really can't justify spending that time..
>
> So I want to focus on the really high-level stuff.
>
> ## High Level
>
> The first of my issues makes me wonder if this has been implemented at
> all.  And
> as I went through this, I found myself asking that same question again
> multiple
> times.  Has it?
>

LACHLAN: It has been implemented with client pull functionality in
https://github.com/openalto/alto which is an ALTO server using HTTP/1.1.
Server push functionality has not been implemented.

Finally, I feel obligated to point out that expending effort on HTTP server
> push
> is perhaps unwise given its relative adoption and success.  That is, it
> has not
> really succeeded in browsers and is being removed.  Much of the benefit
> that a
> protocol like ALTO gains is from using common infrastructure, design
> paradigms,
> tooling, and so forth.  ALTO is a small user of HTTP and so benefits from
> the
> millenia of effort put in to improve HTTP by larger users. Those benefits
> don't
> extend to server push, so there is a real risk of this work becoming hard
> or
> impossible to deploy.
>

LACHLAN: As a group, we authors have had many discussions about this, due
to server push’s lack-of-adoption, but as it is a feature of HTTP that does
look to be extremely useful in the case of ALTO, we feel it makes sense to
include support for this feature in this domain specific case. If indeed it
is the case that server push becomes obsoleted, Section 6 (which defines
Client Pull of incremental updates) provides an equally valid alternative
to Server Push defined in Section 7.

## Caveat, Reprise
>
> To soften this a little, it is entirely possible that some of my criticism
> is
> rooted in not understanding the details well enough.  This is a tower
> built on
> top of a tower build on top of a tower build on top of a protocol that I
> know
> reasonably well, but it is a long way from where I stand to the top of that
> topmost tower.  And it is fair to say that a good review of this document
> (what
> I an not claiming to provide) would demand that I gain familiarity with the
> entire stack.  However, I think that there are several aspects of this
> document
> that could do with some dedicated editorial effort in order to improve
> this sort
> of accessibility.  I've highlighted a few, but I almost certainly missed
> others
> because my focus was on HTTP usage primarily (for instance, I did not
> consider
> whether the security considerations were reasonable or even approximately
> comprehensive).
>
>
> # Issues
>
> ## Server Push Usage
>
> Section 7.1 says "A client can add itself explicitly to the receiver set
> or add
> itself to the receiver set when requesting the TIPS view."  It describes
> two
> methods for doing this, but neither indicates which request will remain
> open so
> that the client can receive push promises.
>

LACHLAN: There is a single persistent connection between a client and a
server, which is the same one that the client uses to request the TIPS view
(and hence the one used to subscribe implicitly). For the explicit case,
the request to explicitly add the client to the receiver set MUST be sent
over the persistent connection.

To make this clear, the TIPS Workflow section will now read: “an ALTO
client opens a persistent HTTP connection with an ALTO server.”


> HTTP server push requires that the server send pushes alongside an
> outstanding
> request, but aside from discussion of streams in Section 7.3.1, I can't
> work out
> how the client would do that.  Section 2.4 also fails to make this clear.
>
> Consequently, I cannot convince myself that the primary feature of this
> document
> will work.
>
> ## Use of Undefined and Poorly Defined Terms
>
> I'm raising this to the level of a serious issue because this draft is made
> extraordinarily difficult to understand as a result of this.  Take Section
> 2.3,
> which introduces some terms.  That same section then includes completely
> different terms in Figure 1; terms that turn out to be critical concepts.
>

LACHLAN: This is a very good point. To solve this overload of information
(and some inconsistencies in Section 2), we have significantly restructured
Section 2.

   - Section 2.2 has been relegated (based on another review) to the
   Appendix A, but this is important to note because it previously cluttered
   this section.
   - Moved Sections 2.1 and 2.1.1 to be their own new section (Section 3),
   titled “TIPS Updates Graph,” and moved the updates graph invariants from
   Section 5.5 to right after these sections
   - Removed “iREST” generic terminology because it distracts from the
   specific TIPS models
   - Moved Sections 2.4 and 2.5 to be their own new section (Section 4),
   titled “TIPS High-level Workflow”
   - Added terms from the former Section 2.1 to the terms list in 2.3
   (which now comes first in the document) to provide one unified place to
   introduce terms. This section now defines the following:


   1. *Network information resource:* Is a piece of retrievable information
   about network state, per <xref target="RFC7285" format="default"/>.
   2. *TIPS view (tv):* Is defined in this document to be the container of
   incremental transport information about the network information resource..
   Though the TIPS view may include other transport information, it has two
   basic components: updates graph (ug) and receiver set (rs). See <xref
   target="dir-resp" format="default"/> for the complete contents of a TIPS
   view.
   3. *Updates graph (ug):* Is a directed, acyclic graph that contains a
   sequence of incremental updates and snapshots (collectively called update
   items) of a network information resource, based on the TIPS data model
   defined in <xref target="basic-data-model" format="default"/>.  Each
   incremental update is assigned a sequence number, and a URI can be
   constructed using the sequence numbers.  A static network information
   resource (e.g., Cost Map, Network Map) may need only a single updates
   graph.  A dynamic network information resource (e.g., Filtered Cost Map)
   may create an updates graph (within a new TIPS view) for each unique filter
   request.
   4. *Receiver set (rs):* Contains the set of clients who have requested
   to receive server push updates.
   5. *Snapshot:* Is a full replacement of a resource and is contained
   within an updates graph.
   6. *Incremental update:* Is a partial replacement of a resource
   contained within an updates graph, codified in this document as a JSON
   Merge Patch or JSON Patch.
   7. *Update item:* Refers to the content on an edge of the updates graph,
   which can be either a snapshot or incremental update.
   8. *ID#i-#j:* Denotes the update item (op, data) on a specific edge in
   the updates graph to transition from version at node i to version at node
   j, where i and j are the respective sequence numbers of the nodes the edge
   connects.
   9. *Information Resource Directory (IRD):* Contains a list of available
   information resources provided by an ALTO server that can be requested by
   ALTO clients, per <xref target="RFC7285" format="default"/>.  Each entry in
   an IRD indicates a URI at which an ALTO server accepts requests, and
   returns either an information resource or an information resource directory
   that references additional information resources.  An ALTO server's IRD MAY
   define one or more transport information publication services, which ALTO
   clients use to request new TIPS instances. See <xref target="tips-ird"
   format="default"/> for more details.

I'll also note, though you might treat this as a separate issue, that while
> the
> use of template-like URLs as a convention is a powerful explanatory tool,
> the
> draft doesn't make this clear enough.  The use of i and j for instance, are
> introduced in an example, which is easy to skip, only to find that the
> rest of
> the document critically depends on understanding what those mean.
>

LACHLAN: With the new changes above, this should be much clearer.
Specifically, ID#i-#j is now defined as “the update item (op, data) on a
specific edge in the updates graph to transition from version at node i to
version at node j, where i and j are the respective sequence numbers of the
nodes the edge connects.” Additionally, the idea of template URIs is now
made explicit following what was Figure 2 in draft -07 with the new
proposed added text:

NEW
TIPS uses this file structure schema to generate template URIs which allow
clients to construct the location of incremental updates after receiving
the tips-view-uri path from the server. The generic template for the
location of the update item on the edge from node i to node j in the
updates graph is:
    <tips-view-uri>/ug/<i>/<j>

## DELETE, but not
>
> Section 4.4 describes a use for HTTP's DELETE verb that is novel to say the
> least.  If the goal is to use DELETE to remove something and that
> something is a
> client's membership in a group (a receiver set here), then you should
> provide
> each client with a URL of their own to delete.  Whether you provide a
> resource
> for the collection (which might be useful for adding clients) or not is up
> to
> you, but this approach is not consistent with how HTTP is expected to
> operate
> and will result in surprises.
>
> Of course, the use of one request (here, the DELETE) to stop server push
> that
> might be happening on another request, is also not how server push is
> expected
> to work.
>

LACHLAN: Good point. We failed to make clear that <tips-view-uri> is unique
per connection, and is de-aliased by the server to refer to a TIPS view of
a resource that may or may not be shared by other clients. Propose adding
clarification to Section 4.2 where we define tips-view-uri: “Relative URI
to the TIPS view of a network resource, which MUST be unique per
connection, and is de-aliased by the server to refer to the actual location
of the TIPS view which may be shared by other clients.”


> ## Connections and Clients
>
> I can't pin this one down, but there seems to be some sort of assumption
> that
> there is a 1:1 correspondence between connections and clients.  That is
> not how
> HTTP works.  In HTTP, every request stands on its own.  Though there might
> be
> linkages between requests, those linkages should not affect how HTTP itself
> operates, including server push.  (You might detect a common theme here.)
>

LACHLAN:
We do assume that there is a persistent connection for monitoring one or
more resources, as each TIPS view provided to a client is session-based.
Per RFC 9112, persistent connection is widely supported and HTTP/1.1 makes
it the default state of a connection. In HTTP/1.x implementations, a client
would have to open a new connection for each resource for which it desires
to receive long-poll incremental updates. In HTTP/2 and /3, because of
substreams, it is possible that a single connection can monitor multiple
resources (based on another review, this idea has been clarified in the
draft in a section entitled “TIPS With Different HTTP Versions”.

The only linkage between requests is that a client must send a request to
first discover where a resource will be served. This is consistent with
“Building Protocols with HTTP” (RFC 9205) which says “Generally, a client
will begin interacting with a given application server by requesting an
initial document that contains information about that particular
deployment, potentially including links to other relevant resources.  Doing
so ensures that the deployment is as flexible as possible (potentially
spanning multiple servers), allows evolution, and also gives the
application the opportunity to tailor the "discovery document" to the
client.” (Section 4.4.1)


>
> As I noted, I'm not completely certain about raising this issue because of
> a
> lack of clarity about how the protocol is supposed to operate.
>
>
> ## Specification by Example
>
> I found that this document leaned a little too heavily on examples, to the
> point
> that it sometimes does not concretely specify expected behaviour at all.
> The
> content of examples were used to show the general shape of what is being
> considered.  As I noted before, examples can be a powerful explanatory
> tool, but
> it means that the true interoperability requirements are not always
> directly
> written.  Implementers need to infer normative requirements in some cases..
>

LACHLAN: Could you give us an example where we don’t specify concrete
behavior first so we can generalize? Our goal was to provide the definition
and semantics of each part and then provide examples. The issue might have
been with Section 2, however, in which case the proposed restructuring and
clarifications might fix that.


> See the use of /<tips-view-uri>/... everywhere it appears, Section 2.1.1
> (schema, where the figures are critical to understanding; I also found
> Figure 2
> very hard to understand, so I had to ignore it, even if it still seems
> crucial),
> Figure 4, the figures in Section 2.4, Section 8.3.
>

LACHLAN: Figure 4 and those in Section 2.4 are hopefully made clearer by
the newly expanded terms section from the reorganization of Section 2. For
the figure in Section 8.3, propose adding new text around it to clarify its
purpose.

NEW (Section 8.3 in -07)

Dependent ALTO resources result in cross-resource dependencies in TIPS.
Consider the following pair of resources, where my-cost-map (C) is
dependent on my-network-map (N). The updates graph for each resource is
shown, along with links in between the respective updates graphs to show
dependency:
<https://author-tools.ietf.org/api/export/51c34914-b8d8-46f4-8978-d87e1f6d4d2a/draft-ietf-alto-new-transport.html#section-10.3-1>

                       +---+   +---+   +---+   +---+   +---+
  my-network-map (N)   | 0 |-->|89 |-->|90 |-->|91 |-->|92 |
                       +---+   +---+   +---+   +---+   +---+
                                 |   \       \       \
                                 |    \       \       \
                       +---+   +---+   +---+   +---+   +---+
  my-cost-map (C)      | 0 |-->|101|-->|102|-->|103|-->|104|
                       +---+   +---+   +---+   +---+   +---+
                        |_______________________|

Figure 6
<https://author-tools.ietf.org/api/export/51c34914-b8d8-46f4-8978-d87e1f6d4d2a/draft-ietf-alto-new-transport.html#figure-6>
: Example Dependency Model
<https://author-tools.ietf.org/api/export/51c34914-b8d8-46f4-8978-d87e1f6d4d2a/draft-ietf-alto-new-transport.html#name-example-dependency-model>

In Figure 6, the cost-map version at node 101 and 102 is dependent on the
network-map version at node 89. The cost-map version at node 103 is
dependent on the network-map version at node 90, and so on.

LACHLAN: Figure 2 is definitely important to understand - we were trying to
show a simple file system that makes it easy to conceptualize how fetching
the updates works. Based on your previous comments about the use of poorly
defined terms, this section has been changed and is hopefully now more
clear, including a forward link to the section that defines
<tips-view-uri>.

NEW (Section 2.1.1 in -07)

To access each individual update in an updates graph, consider the model
represented as a "virtual" file system (adjacency list), contained within
the root of a TIPS view URI (see Section 6.2 for tips-view-uri definition).
For example:
<https://author-tools.ietf.org/api/export/51c34914-b8d8-46f4-8978-d87e1f6d4d2a/draft-ietf-alto-new-transport.html#section-3.2-1>

      <tips-view-uri>  // relative URI to TIPS view
          ug    // updates graph
              0
                  101    // full 101 snapshot
                  103
                  105
              101
                  102    // 101 -> 102 incremental update
              102
                  103
              103
                  104
                  105    // optional shortcut 103 -> 105 incr. update
              104
                  105
              105
                  106
          push           // server push metadata
            ...
          meta           // TIPS view meta
            ...


Figure 3
<https://author-tools.ietf.org/api/export/51c34914-b8d8-46f4-8978-d87e1f6d4d2a/draft-ietf-alto-new-transport.html#figure-3>
: Location Schema for Figure 2
<https://author-tools.ietf.org/api/export/51c34914-b8d8-46f4-8978-d87e1f6d4d2a/draft-ietf-alto-new-transport.html#name-location-schema-for-figure->

TIPS uses this file structure schema to generate template URIs which allow
clients to construct the location of incremental updates after receiving
the tips-view-uri path from the server. The generic template for the
location of the update item on the edge from node i to node j in the
updates graph is:
<https://author-tools.ietf.org/api/export/51c34914-b8d8-46f4-8978-d87e1f6d4d2a/draft-ietf-alto-new-transport.html#section-3.2-3>

    <tips-view-uri>/ug/<i>/<j>

<https://author-tools.ietf.org/api/export/51c34914-b8d8-46f4-8978-d87e1f6d4d2a/draft-ietf-alto-new-transport.html#section-3.2-4>

Due to the sequential nature of the update item IDs, a client can long poll
a future update that does not yet exist (e.g., the incremental update from
106 to 107) by constructing the URI for the next edge that will be added,
starting from the sequence number of the current last node in the graph to
the next sequential node (last node sequence number + 1):
<https://author-tools.ietf.org/api/export/51c34914-b8d8-46f4-8978-d87e1f6d4d2a/draft-ietf-alto-new-transport.html#section-3.2-5>

    GET /<tips-view-uri>/ug/<end-seq>/<end-seq + 1>



>
> Probably the biggest hole here is j+1 in examples.  I couldn't find a
> statement
> anywhere that says that increments need to strictly increment by 1 each
> time
> (separately, why 101 and not just 1?).  There seems to be an assumption
> about
> that, but the directory resources all seem to indicate that the server is
> responsible for numbering increments.  Fixing that seems important.
>

LACHLAN: 101 is just an example. The queue can be compacted so we decided
not to start with 1 to try to show that the starting number didn’t matter.
What does matter is that the increments need to strictly increment by one
(so that the client can construct proactive URIs for long polling). Section
5.5 specifies the invariants of the updates graph which includes this idea
of continuity. Based on your final nit comment below, Section 5.5 has been
clarified a bit and moved to right after where the Updates Graph data model
is defined. But for this section specifically, to front this increment-by-1
requirement, propose appending “(i.e., incremented by 1 each time)” to
Section 2.1:

OLD
For each resource (e.g., a cost map, network map), the incremental updates
and snapshots can be represented using the following directed acyclic graph
model, where the server maps base resource IDs to incremental update IDs
that are assigned sequentially:

NEW
For each resource (e.g., a cost map, network map), the incremental updates
and snapshots can be represented using the following directed acyclic graph
model, where the server maps base resource IDs to incremental update IDs
that are assigned sequentially (i.e., incremented by 1 each time):


>
> ## Complicated
>
> This design is very complex.  Some of the details in the document probably
> do
> not need to be specified in the level of detail provided.  Section 5 for
> instance describes resources that provide clients with data about other
> resources, which isn't really consistent with HTTP principles, but they
> probably
> aren't necessary either.
>

LACHLAN: Section 5 details how metadata for a resource's TIPS view (e.g.,
details about the updates graph, etc.) can be retrieved. Since we are
dealing with dynamic resources that are constantly updating, a client may
desire to determine how they are changing. Can you elaborate why this
principle would be hurtful?

As long as servers adhere to the invariants (S5.5), clients can ask for
> incremental resources based on what they know and either get a snapshot or
> increments based on what the server is willing to serve them and what they
> are
> willing to process.  The design here requires additional round trips to
> gather
> information about the information the client really wants when it could
> probably
> ask for a resource, use etags to indicate what it has, use Accept to
> indicate
> what it can handle, and the server could then work out what best to serve
> up.
>

LACHLAN: What you described in that last sentence is how this protocol
works if the client already has a version of the resource. Otherwise, the
client must send just 1 request to the server to understand at what URI the
server is serving resources (the server also provides the client with a
starting edge to consume), then the client can construct all future URIs
from there, which we felt was quite simple and effective. Could you clarify
where you see the client needing to perform unnecessary round trips?

A specific problem that this design creates is strong coupling.  The client
> needs to know about URI structure in order to use the information in the
> TIPS
> view.
>

LACHLAN: Yes, the client does need to send one request to get the URI
structure. But once it does that, knowing the URI structure allows the
client to fetch updates in advance without having to wait. For server push,
the client does not need to know the URI structure, only how to add itself
to the receiver set of a resource. In earlier versions we did try out a
design where info would be provided by the server explicitly but that
produced a lot of complexity in terms of structure because the client had
to constantly fetch the URIs it must use instead of being able to construct
them. Though this would have the benefit of adding one more layer of
indirection, we felt it added too much complexity to be worth the tradeoff.


Similar comments might apply to managing the set of clients that might want
> server push (though again, see the first issue).
>
> # Nits
>
> These are just the ones that really jumped out.
>
> I suggest checking for typos, I saw many.
>

LACHLAN: Fixed a number of them that I could find


> Please use proper section references when linking.  Links in the form
> `<xref
> target="RFC1234" section="4.5"/>` will ensure that your HTML is properly
> generated.
>

LACHLAN: Will make sure this is fixed in the next version.

Please submit a bug report for whatever is going on with the caption on
> Figure
> 2.  It looks like you have a cross reference in there that xml2rfc is
> mangling.
>

LACHLAN: Took out the reference to fix this

"long pull" is not a thing (Section 2.1)


LACHLAN: Changed to “long poll”


> "Connection: Closed" is not a thing (Section 4.2)
>

LACHLAN: Great catch. Changed to “Connection: close”


> Please use real lists (1) especially when the content of the item is long
> and
> (2) because it makes (a) reading and (b) citing the document easier.
>

LACHLAN: Fixed in Introduction and Sec 7.1.3, which should be all instances
of that.

I can't work out why "next-edge" (Section 5.2) is null when server push is
> disabled.  Why would a client need to know that only when push is
> enabled?  If
> push is enabled, won't the client be pushed the next increment?  On the
> other
> hand, if push is disabled, won't the client need to know what to request
> next?
>

LACHLAN: Actually, since the “next-edge” will be sent anyway in the push
promise frame during server push, it isn’t necessary at all, so we will
remove this from the next version. In regards to that last question: if not
using server push, a client only needs a recommended starting edge which it
gets from initially requesting the TIPS view summary. It then is able to
construct the rest of the URIs for the updates without the server needing
to tell it. Therefore, “next-edge” is null when server push is disabled
because it is irrelevant. The client has become responsible for deciding
what to pull.


> ## Section 5.5 is Complicated
>
> Section 5.5 says:
>
> > Continuity: ns -> ne, anything in between ns and ne also exists (implies
> ni ->
>   ni + 1 patch exists), where ns is start-seq and ne is end-seq
>
> This section might be reduced to saying:
>
> > A server needs to ensure that any resource state that it makes available
> MUST
> be reachable by clients, either directly via a snapshot (that is, relative
> to 0)
> or indirectly by requesting an earlier snapshot and a contiguous set of
> incremental updates.
>

LACHLAN: Propose adding that paragraph to the beginning of the section to
clarify (along with another sentence of clarification), but keeping the
Continuity part because we are defining it as separate from Feasibility.
Continuity, or the idea that each update increments by 1, is important for
proactive long polling by the client because they can fetch the next update
by constructing the URI it will be at.

OLD
 A server may change its updates graph (to compact, etc.), but it must
satisfy the following invariants:

NEW
A server MAY change its updates graph (to compact, to add nodes, etc.), but
it MUST ensure that any resource state that it makes available is reachable
by clients, either directly via a snapshot (that is, relative to 0) or
indirectly by requesting an earlier snapshot and a contiguous set of
incremental updates. Additionally, to allow clients to proactively
construct URIs for future update items, the ID of each added node in the
updates graph must increment contiguously by 1. More specifically, the
updates graph MUST satisfy the following invariants:
Received on Friday, 14 April 2023 17:11:32 UTC