Added kernel tuning docs (#3921)
parent
a099319e66
commit
a9c0f1e0a4
@ -0,0 +1,71 @@ |
|||||||
|
# Kernel Tuning for Minio Production Deployment on Linux Servers [![Slack](https://slack.minio.io/slack?type=svg)](https://slack.minio.io) [![Go Report Card](https://goreportcard.com/badge/minio/minio)](https://goreportcard.com/report/minio/minio) [![Docker Pulls](https://img.shields.io/docker/pulls/minio/minio.svg?maxAge=604800)](https://hub.docker.com/r/minio/minio/) [![codecov](https://codecov.io/gh/minio/minio/branch/master/graph/badge.svg)](https://codecov.io/gh/minio/minio) |
||||||
|
|
||||||
|
## Tuning Network Parameters |
||||||
|
|
||||||
|
Following network parameter settings can help ensure optimal Minio server performance on production workloads. |
||||||
|
|
||||||
|
- *`tcp_fin_timeout`* : A socket left in memory takes approximately 1.5Kb of memory. It makes sense to close the unused sockets preemptively to ensure no memory leakage. This way, even if a peer doesn't close the socket due to some reason, the system itself closes it after a timeout. `tcp_fin_timeout` variable defines this timeout and tells kernel how long to keep sockets in the state FIN-WAIT-2. We recommend setting it to 30. You can set it as shown below |
||||||
|
|
||||||
|
```sh |
||||||
|
`sysctl -w net.ipv4.tcp_fin_timeout=30` |
||||||
|
``` |
||||||
|
|
||||||
|
- *`tcp_keepalive_probes`* : This variable defines the number of unacknowledged probes to be sent before considering a connection dead. You can set it as shown below |
||||||
|
|
||||||
|
```sh |
||||||
|
sysctl -w net.ipv4.tcp_keepalive_probes=5 |
||||||
|
``` |
||||||
|
|
||||||
|
- *`wmem_max`*: This parameter sets the max OS send buffer size for all types of connections. |
||||||
|
|
||||||
|
```sh |
||||||
|
sysctl -w net.core.wmem_max=540000 |
||||||
|
``` |
||||||
|
|
||||||
|
- *`rmem_max`*: This parameter sets the max OS receive buffer size for all types of connections. |
||||||
|
|
||||||
|
```sh |
||||||
|
sysctl -w net.core.rmem_max=540000 |
||||||
|
``` |
||||||
|
|
||||||
|
## Tuning Virtual Memory |
||||||
|
|
||||||
|
Recommended virtual memory settings are as follows. |
||||||
|
|
||||||
|
- *`swappiness`* : This parameter controls the relative weight given to swapping out runtime memory, as opposed to dropping pages from the system page cache. It takes values from 0 to 100, both inclusive. We recommend setting it to 10. |
||||||
|
|
||||||
|
```sh |
||||||
|
sysctl -w vm.swappiness=10 |
||||||
|
``` |
||||||
|
|
||||||
|
- *`dirty_background_ratio`*: This is the percentage of system memory that can be filled with `dirty` pages, i.e. memory pages that still need to be written to disk. We recommend writing the data to the disk as soon as possible. To do this, set the `dirty_background_ratio` to 1. |
||||||
|
|
||||||
|
```sh |
||||||
|
sysctl -w vm.dirty_background_ratio=1 |
||||||
|
``` |
||||||
|
|
||||||
|
- *`dirty_ratio`*: This defines is the absolute maximum amount of system memory that can be filled with dirty pages before everything must get committed to disk. |
||||||
|
|
||||||
|
```sh |
||||||
|
sysctl -w vm.dirty_ratio=5 |
||||||
|
``` |
||||||
|
|
||||||
|
## Tuning Scheduler |
||||||
|
|
||||||
|
Proper scheduler configuration makes sure Minio process gets adequate CPU time. Here are the recommended scheduler settings |
||||||
|
|
||||||
|
- *`sched_min_granularity_ns`*: This parameter decides the minimum time a task will be be allowed to run on CPU before being pre-empted out. We recommend setting it to 10ms. |
||||||
|
|
||||||
|
```sh |
||||||
|
sysctl -w kernel.sched_min_granularity_ns=10 |
||||||
|
``` |
||||||
|
|
||||||
|
- *`sched_wakeup_granularity_ns`*: Lowering this parameter improves wake-up latency and throughput for latency critical tasks, particularly when a short duty cycle load component must compete with CPU bound components. |
||||||
|
|
||||||
|
```sh |
||||||
|
sysctl -w kernel.sched_wakeup_granularity_ns=15 |
||||||
|
``` |
||||||
|
|
||||||
|
## Tuning Disks |
||||||
|
|
||||||
|
The recommendations for disk tuning are conveniently packaged in a well commented [shell script](https://github.com/minio/minio/blob/master/docs/deployment/kernel-tuning/disk-tuning.sh). Please review the shell script for our recommendations. |
@ -0,0 +1,48 @@ |
|||||||
|
#!/bin/bash |
||||||
|
|
||||||
|
## Minio Cloud Storage, (C) 2017 Minio, Inc. |
||||||
|
## |
||||||
|
## Licensed under the Apache License, Version 2.0 (the "License"); |
||||||
|
## you may not use this file except in compliance with the License. |
||||||
|
## You may obtain a copy of the License at |
||||||
|
## |
||||||
|
## http://www.apache.org/licenses/LICENSE-2.0 |
||||||
|
## |
||||||
|
## Unless required by applicable law or agreed to in writing, software |
||||||
|
## distributed under the License is distributed on an "AS IS" BASIS, |
||||||
|
## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
||||||
|
## See the License for the specific language governing permissions and |
||||||
|
## limitations under the License. |
||||||
|
|
||||||
|
for i in $(ls -d /sys/block/*/queue/iosched 2>/dev/null); do |
||||||
|
iosched_dir=$(echo $i | awk '/iosched/ {print $1}') |
||||||
|
[ -z $iosched_dir ] && { |
||||||
|
continue |
||||||
|
} |
||||||
|
## Change each disk ioscheduler to be "deadline" |
||||||
|
## Deadline dispatches I/Os in batches. A batch is a |
||||||
|
## sequence of either read or write I/Os which are in |
||||||
|
## increasing LBA order (the one-way elevator). After |
||||||
|
## processing each batch, the I/O scheduler checks to |
||||||
|
## see whether write requests have been starved for too |
||||||
|
## long, and then decides whether to start a new batch |
||||||
|
## of reads or writes |
||||||
|
path=$(dirname $iosched_dir) |
||||||
|
[ -f $path/scheduler ] && { |
||||||
|
echo "deadline" > $path/scheduler |
||||||
|
} |
||||||
|
## This controls how many requests may be allocated |
||||||
|
## in the block layer for read or write requests. |
||||||
|
## Note that the total allocated number may be twice |
||||||
|
## this amount, since it applies only to reads or |
||||||
|
## writes (not the accumulate sum). |
||||||
|
[ -f $path/nr_requests ] && { |
||||||
|
echo "256" > $path/nr_requests |
||||||
|
} |
||||||
|
## This is the maximum number of kilobytes |
||||||
|
## supported in a single data transfer at |
||||||
|
## block layer. |
||||||
|
[ -f $path/max_hw_sectors_kb ] && { |
||||||
|
echo "1024" > $path/max_hw_sectors_kb |
||||||
|
} |
||||||
|
done |
Loading…
Reference in new issue