Kubernetes API Priority & Fairness

Typical traffics handled by a Kubernetes API server

  1. Kubernetes api server will be requesting itself even if there is no client requests at all. Example of such scenario is when the api server has to fetch the status of the cluster.
  2. Kubernetes provides extension knobs which kicks off when API server is serving an incoming request. Such web hooks (admission/mutating) can spawn requests back to kube api server.
  3. Controller bugs (unpaginated list requests or retrying storm of failed requests) could result in issuing a request loop to api servers
  4. Daemons

What are the available options to tackle this situation?

  1. At the source (Client Side) — Kubernetes client-go provides rate limiter to tackle the situation; however client side rate limiting have shortcomings where a user can choose to opt out from rate limiting.
  2. At the gateway (Server Side) — Kubernetes releases prior to 1.20, had a simple mechanism for protecting itself against CPU & memory overloads -
  • max-requests-inflight (Default — 400) Max number of non-mutating requests in flight at a given time.
  • max-mutating-requests-inflight (Default — 200) Max number of mutating requests in flight at a given time.
  • min-request-timeout (Default — 1800) Min number of seconds a handler must keep a request open before timing it out.

API Priority & Fairness Introduction

API request flows
  • When requests of a priority level are being throttled, requests of other priority level remain unaffected.
  • To further enforce fairness among requests of a priority level, the matching flow schema associates requests with flows, where requests originating from same source are assigned the same flow distinguisher.
Default Kubernetes FlowSchemas & PriorityLevelConfigurations
Details of system-leader-election FlowSchema
  • At least one of its subjects matches the subject making the request and
  • At least one of its resourceRules or nonResourceRules matches the verb and (non-)resource being requested
  • ByUser where requests originated from the same subject are grouped into the same flow so that users can’t overwhelm each other
  • ByNamespace where requests originated from the same namespace are grouped into the same flow so that workloads in one namespace can’t overwhelm those in other namespaces
  • An empty string where all requests are grouped into a single flow
Details of leader-election PriorityLevelConfiguration
  • The limited.assuredConcurrencyShares defines the concurrency shares used to calculate the assured concurrency value
  • The limited.limitResponse defines the strategy to handle requests that can’t be executed immediately.
  • Queue where requests are queued. In Queue limit response type, the queueing behaviour can further be configured by adjusting the properties of limited.limitResponse.queuing.
  • Reject where requests are dropped with an HTTP 429 error



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store