Pagination Strategies in Elasticsearch


by Thomas Tran



In this article, we’ll talk about pagination in Elasticsearch (ES) and strategies to perform deep pagination and aggregations of large datasets beyond the default 10,000 document limit that ES imposes for pagination.

From/Size Pagination

If you need to paginate with less than 10,000 documents, then you should use the standard from/size pagination strategy. All you have to do is pass the parameters “from” and “size” in your request body into the Search API:

Request URL: POST /index-name/_search

Request Body:

{
  "query": { },
  "size": 10,
  "from": 0
}

This will return 10 results from the beginning.

You can use the from/size pagination strategy as long as from + size <= 10000. However, if you go above it, like in the example below:

{
  "query": { },
  "size": 10,
  "from": 9991
}

… you will get the following error:

Result window is too large, from + size must be less than or equal to: [10000] but was [10001].

You can override this limit by changing the [index.max_result_window] index level setting. But beware, this setting is there for a reason. Elasticsearch has to parse the query, build the search context, distribute the query to applicable shards, collate the results, skip past $offset items, then read out $limit items and destroy the search context for each page which means that the deeper we paginate, each page is more expensive than the page before it. So what I mean by all of this is that you should not be changing [index.max_result_window] from its default.

Deep Pagination using the Scroll API

One strategy to page beyond 10,000 results is to use the Scroll API, which returns one page at a time. Each time you call the Scroll API, it will retrieve the next page in your results. To implement deep paging, we will repeatedly “scroll” until we reach the page that the user wants and then return that page.

To do this, you’ll need to make an initial call to the Search API with the query parameter ?scroll=1m, which will tell the API to return a _scoll_id that you can use to retrieve the next batch of results.

Request URL: POST /index-name/_search?scroll=1m

Request Body:

{
  "query": { },
  "size": 10
}

Notice that the from parameter is not allowed in a scroll context.

If you want the next batch of results, then all you have to do is pass in the _scroll_id that you received into the Scroll API:

Request URL: POST /_search/scroll

Request Body:

{
  "scroll": "1m",
  "scroll_id": "..."
}

This will return the next page. Keep calling the Scroll API until you get to your desired page.

Deep Pagination Using search_after

The search_after parameter is a new implementation to paginate large results that circumvents the problem of the Scroll API causing search requests taking large heap memory and time. It works similarly to the Scroll API. But unlike scroll, search_after uses a unique sort value that must be the same every time.

To start, make a call to POST /index-name/_pit?keep-alive=1m to retrieve a PIT ID, which you’ll need to include in you search request:

Request URL: POST /index-name/_search

Request Body:

{
  "query": { },
  "size": 10,
  "pit": {
    "id": "...",
    "keep_alive": "1m"
  },
  "sort": [
    {"@timestamp": { "order": "asc" }
  ]
}

You must provide a unique sort value with a tiebreaker field which, by default, is the last value in the “sort” array. If you do not provide a tiebreaker field, your paged results could miss or duplicate hits.

By default, all PIT search requests add an implicit sort tiebreaker field called _shard_doc.

After the initial call, you should make subsequent calls to get the next page of results:

Request URL: GET /_search

Request Body:

{
  "query": { },
  "pit": {
    "id": "...",
    "keep_alive": "1m"
  },
  "search_after": [
    "2021-01-01T00:00:00.000Z",
    4294967298
  ],
  "sort": [
    {"@timestamp": {"order": "asc" }
  ]
}

The search_after field contains the sort values from the previous response! And, of course, you should repeat this request until you reach your desired page!