fix: readiness needs to be like liveness (#9941)
Readiness as no reasoning to be cluster scope because that is not how the k8s networking works for pods, all the pods to a deployment are not sharing the network in a singleton. Instead they are run as local scopes to themselves, with readiness failures the pod is potentially taken out of the network to be resolvable - this affects the distributed setup in myriad of different ways. Instead readiness should behave like liveness with local scope alone, and should be a dummy implementation. This PR all the startup times and overal k8s startup time dramatically improves. Added another handler called as `/minio/health/cluster` to understand the cluster scope health.master
parent
27a1f3ed2b
commit
c0ac25bfff
@ -1,35 +1,42 @@ |
|||||||
## MinIO Healthcheck |
## MinIO Healthcheck |
||||||
|
|
||||||
MinIO server exposes two un-authenticated, healthcheck endpoints - liveness probe and readiness probe at `/minio/health/live` and `/minio/health/ready` respectively. |
MinIO server exposes three un-authenticated, healthcheck endpoints liveness probe, readiness probe and a cluster probe at `/minio/health/live`, `/minio/health/ready` and `/minio/health/cluster` respectively. |
||||||
|
|
||||||
### Liveness probe |
### Liveness probe |
||||||
|
|
||||||
This probe is used to identify situations where the server is running but may not behave optimally, i.e. sluggish response or corrupt back-end. Such problems can be *only* fixed by a restart. |
This probe always responds with '200 OK'. When liveness probe fails, Kubernetes like platforms restart the container. |
||||||
|
|
||||||
Internally, MinIO liveness probe handler checks if backend is alive and in read quorum to take requests. |
``` |
||||||
|
livenessProbe: |
||||||
When liveness probe fails, Kubernetes like platforms restart the container. |
httpGet: |
||||||
|
path: /minio/health/live |
||||||
|
port: 9000 |
||||||
|
scheme: HTTP |
||||||
|
initialDelaySeconds: 3 |
||||||
|
periodSeconds: 1 |
||||||
|
timeoutSeconds: 1 |
||||||
|
successThreshold: 1 |
||||||
|
failureThreshold: 3 |
||||||
|
``` |
||||||
|
|
||||||
### Readiness probe |
### Readiness probe |
||||||
|
|
||||||
This probe is used to identify situations where the server is not ready to accept requests yet. In most cases, such conditions recover in some time such as quorum not available on drives due to load. |
This probe always responds with '200 OK'. When readiness probe fails, Kubernetes like platforms *do not* forward traffic to a pod. |
||||||
|
|
||||||
Internally, MinIO readiness probe handler checks for backend is alive and in read quorum then the server returns 200 OK, otherwise 503 Service Unavailable. |
|
||||||
|
|
||||||
Platforms like Kubernetes *do not* forward traffic to a pod until its readiness probe is successful. |
|
||||||
|
|
||||||
### Configuration example |
|
||||||
|
|
||||||
Sample `liveness` and `readiness` probe configuration in a Kubernetes `yaml` file can be found [here](https://github.com/minio/minio/blob/master/docs/orchestration/kubernetes/minio-standalone-deployment.yaml). |
|
||||||
|
|
||||||
### Configure readiness deadline |
|
||||||
Readiness checks need to respond faster in orchestrated environments, to facilitate this you can use the following environment variable before starting MinIO |
|
||||||
|
|
||||||
``` |
|
||||||
MINIO_API_READY_DEADLINE (duration) set the deadline for health check API /minio/health/ready e.g. "1m" |
|
||||||
``` |
``` |
||||||
|
readinessProbe: |
||||||
|
httpGet: |
||||||
|
path: /minio/health/ready |
||||||
|
port: 9000 |
||||||
|
scheme: HTTP |
||||||
|
initialDelaySeconds: 3 |
||||||
|
periodSeconds: 1 |
||||||
|
timeoutSeconds: 1 |
||||||
|
successThreshold: 1 |
||||||
|
failureThreshold: 3 |
||||||
|
|
||||||
Set a *5s* deadline for MinIO to ensure readiness handler responds with-in 5seconds. |
|
||||||
``` |
|
||||||
export MINIO_API_READY_DEADLINE=5s |
|
||||||
``` |
``` |
||||||
|
|
||||||
|
### Cluster probe |
||||||
|
|
||||||
|
This probe is not useful in almost all cases, this is meant for administrators to see if quorum is available in any given cluster. The reply is '200 OK' if cluster has quorum if not it returns '503 Service Unavailable'. |
||||||
|
Loading…
Reference in new issue