Kubernetes API Priority & Fairness

4 min readApr 30, 2022

With more & more adoption of Kubernetes in the industry, there is steep rise in the number of custom controllers being deployed to the cluster. Lack of proper resiliencies among these custom controllers could lead to unregulated traffic resulting in cluster instabilities.

A need was felt to prioritise a given set of traffic from all different clients without starving critical control plane traffic.

This GitHub issue essentially captured the above described situation.

Typical traffics handled by a Kubernetes API server

Kubernetes api server will be requesting itself even if there is no client requests at all. Example of such scenario is when the api server has to fetch the status of the cluster.
Kubernetes provides extension knobs which kicks off when API server is serving an incoming request. Such web hooks (admission/mutating) can spawn requests back to kube api server.
Controller bugs (unpaginated list requests or retrying storm of failed requests) could result in issuing a request loop to api servers
Daemons

All or any of these possesses a tremendous potential to overwhelm an API server which could lead to cluster instabilities.

What are the available options to tackle this situation?

At the source (Client Side) — Kubernetes client-go provides rate limiter to tackle the situation; however client side rate limiting have shortcomings where a user can choose to opt out from rate limiting.
At the gateway (Server Side) — Kubernetes releases prior to 1.20, had a simple mechanism for protecting itself against CPU & memory overloads -

max-requests-inflight (Default — 400) Max number of non-mutating requests in flight at a given time.
max-mutating-requests-inflight (Default — 200) Max number of mutating requests in flight at a given time.
min-request-timeout (Default — 1800) Min number of seconds a handler must keep a request open before timing it out.

These command-line flags were used to regulate the volume of inbound requests. Apart from the distinction between mutating & readonly, no other distinctions were made among requests; consequently there can be undesirable scenarios where one subset of the request load crowds out other parts of the request load; thus resulting in still overwhelming the server.

So, surely these rail guards were not enough!

API Priority & Fairness Introduction

Due to the shortcomings explained in the previous sections, APF (API Priority & Fairness) was eventually introduced as Kubernetes default with the release 1.20 and along with it introduces a flow control mechanism allowing platform owner to define API level policies to regulate inbound requests to the API server.

By classifying api requests into flows & priorities, APF manages and throttles all inbound requests in a prioritised and fair manner.

Every requests will be matched with exactly one flow schema which assigns the requests to a priority level.

When requests of a priority level are being throttled, requests of other priority level remain unaffected.
To further enforce fairness among requests of a priority level, the matching flow schema associates requests with flows, where requests originating from same source are assigned the same flow distinguisher.

Below is an excerpt of the default available -

Default Kubernetes FlowSchemas & PriorityLevelConfigurations

Among others, there is a system-leader-election flowschemas and it seems to be associated with leader-election prioritylevelconfigurations.

Details of system-leader-election FlowSchema

The rules describe the list of criteria used to identify matching requests. The flow schema matches a request if and only if:

At least one of its subjects matches the subject making the request and
At least one of its resourceRules or nonResourceRules matches the verb and (non-)resource being requested

The distinguishedMethod defines how the flow distinguishers are computed:

ByUser where requests originated from the same subject are grouped into the same flow so that users can’t overwhelm each other
ByNamespace where requests originated from the same namespace are grouped into the same flow so that workloads in one namespace can’t overwhelm those in other namespaces
An empty string where all requests are grouped into a single flow

When matching requests, a flow schema with a lower matchingPrecedence has higher precedence than one with a higher matchingPrecendence.

The priorityLevelConfiguration refers to the priority level configuration resource that specifies the flow control attributes -

Details of leader-election PriorityLevelConfiguration

The limited.assuredConcurrencyShares defines the concurrency shares used to calculate the assured concurrency value
The limited.limitResponse defines the strategy to handle requests that can’t be executed immediately.

limited.limitResponse.type supports two values:

Queue where requests are queued. In Queue limit response type, the queueing behaviour can further be configured by adjusting the properties of limited.limitResponse.queuing.
Reject where requests are dropped with an HTTP 429 error

Kubernetes API Priority & Fairness

Typical traffics handled by a Kubernetes API server

What are the available options to tackle this situation?

API Priority & Fairness Introduction

Written by Sanjit Mohanty

No responses yet