Pages vs. chunks from Pierre Thierry on 2025-07-19 (public-hydra@w3.org from July 2025)

From: Pierre Thierry <pierre@nothos.net>
Date: Sat, 19 Jul 2025 14:37:14 +0200
To: public-hydra@w3.org
Message-ID: <98476064-7ae1-423b-bac6-9cfa538e3a38@nothos.net>
Hi,

the recent discussion about paginated collections reminded me of a 
distinction I'd like to see in APIs…


  Pages are for… well, web pages!

When you're serving a large collection as HTML pages, it makes sense to 
present them to the user with the most recent ones first, and in pages 
of consistent size. We expect that a smaller page means we have reached 
the end of the collection.

If your collection has items [0..112] in chronological order, you would 
usually present them in three pages:

  * page 1: [112..63]
  * page 2: [62..13]
  * page 3: [12..0]

It is convenient for the viewer and conventional, but as soon as the 
collection is modified, all pages change and this breaks caching:

  * page 1: [123..74]
  * page 2: [73..24]
  * page 3: [23..0]


  We can do better with APIs

But when we're serving an API, even if it's meant to display pages to an 
end-user, we could do better, and instead of serving ever-sliding pages, 
we could serve chunks that can acquire some degree of immutability in 
their lifecycle:

  * chunk 3: [112..100]
  * chunk 2: [99..50]
  * chunk 1: [49..0]

When the collection changes, some chunks keep the same URI and the same 
content:

  * chunk 3: [123..100]
  * chunk 2: [99..50]
  * chunk 1: [49..0]

Many APIs that serve large collections where large portions don't change 
and the server can organize the chunks to maximize their caching. A 
single API call can get the client the list of chunks with metadata that 
doesn't even make it necessary to send requests to validate the cache 
(like Youtube's API including etags in the responses).

When the goal is to display pages to the user, the client could still 
present conventional pages, and still read data through chunks, with the 
benefits of caching:

  * request chunk list => get [chunk5 [238..200], chunk4 [199..150],
    chunk3 [149..100], chunk2 [99..50], chunk1 [49..0]]
  * request chunk5
  * request chunk4
  * display [238..189]
  * request chunk3 in the background
  * user asks for next page
  * display [188..139]
  * request chunk2 in the background
  * user asks for next page
  * display [138..89]

If the user asks to refresh, that client would only request the chunk 
list, and if the only chunk to change was the first, only request that 
one, and change its local pages accordingly.

My last use case was a backoffice dashboard showing hundreds of customer 
files, where old files rarely changed, but they sometimes did, so in 
that case, not only the most recent chunks would change, sometimes older 
ones would change too. This scheme is pretty flexible, immutability of 
some of the chunks is only a possibility, not a requirement.


  Proposal: Hydra ChunkedCollectionView

I propose to add a chunked view to collections. I'm wondering how it 
would be best exposed to clients. The goal is to minimize network usage, 
so it might be a problem that the default view of a collection is the 
entirety of its members, so I see at least two solutions:

 1. In resources linking to the collection, provide both views. Clients
    that are equipped to deal with chunks should prefer them when their
    use is relevant, other clients could just go through the normal
    direct view
 2. Make the chunked view the default view, providing a link to the
    direct view

{
   "@context":"http://www.w3.org/ns/hydra/context.jsonld",
   "@id":"http://api.example.com/an-issue/comments",
   "@type": "Collection",
   "totalItems": 4980,
   "chunks": [
     {
       "@id":"http://api.example.com/an-issue/comments?chunk=3",
       "@type": "CollectionChunk",
       "totalItems": 23,
       "etag": "6VvMb6b1ZhY="
     },
     {
       "@id":"http://api.example.com/an-issue/comments?chunk=2",
       "@type": "CollectionChunk",
       "totalItems": 50,
       "etag": "dcoROfsQdf4="
     },
     {
       "@id":"http://api.example.com/an-issue/comments?chunk=1",
       "@type": "CollectionChunk",
       "totalItems": 50,
       "etag": "fl1RAAAXFE0="
     },
   ],
   "view": {
     "@id":"http://api.example.com/an-issue/comments/chunks",
     "@type": "ChunkedCollectionView",
   }
}

Curiously,
Pierre Thierry
-- 

pierre@nothos.net
0xD9D50D8A
Attachments

application/pgp-keys attachment: OpenPGP public key
Received on Saturday, 19 July 2025 12:37:22 UTC