This improves the startup time significantly
for clusters which have lot of buckets.
Also fixes a bug where `.minio.sys` is created
on disks which do not have `format.json`
* Implement heal format REST API handler
* Implement admin peer rpc handler to re-initialize storage
* Implement HealFormat API in pkg/madmin
* Update pkg/madmin API.md to incl. HealFormat
* Added unit tests for ReInitDisks rpc handler and HealFormatHandler
This change brings in changes at multiple places
- Reuse buffers at almost all locations ranging
from rpc, fs, xl, checksum etc.
- Change caching behavior to disable itself
under low memory conditions i.e < 8GB of RAM.
- Only objects cached are of size 1/10th the size
of the cache for example if 4GB is the cache size
the maximum object size which will be cached
is going to be 400MB. This change is an
optimization to cache more objects rather
than few larger objects.
- If object cache is enabled default GC
percent has been reduced to 20% in lieu
with newly found behavior of GC. If the cache
utilization reaches 75% of the maximum value
GC percent is reduced to 10% to make GC
more aggressive.
- Do not use *bytes.Buffer* due to its growth
requirements. For every allocation *bytes.Buffer*
allocates an additional buffer for its internal
purposes. This is undesirable for us, so
implemented a new cappedWriter which is capped to a
desired size, beyond this all writes rejected.
Possible fix for #3403.
This is needed as explained by @krisis
Lets say we have following errors.
```
[]error{nil, errFileNotFound, errDiskAccessDenied, errDiskAccesDenied}
```
Since the last two errors are filtered, the maximum is nil,
depending on map order.
Let's say we get nil from reduceErr. Clearly at this point
we don't have quorum nodes agreeing about the data and since
GetObject only requires N/2 (Read quorum) and isDiskQuorum
would have returned true. This is problematic and can lead to
undersiable consequences.
Fixes#3298
Disks when are offline for a long period of time, we should
ignore the disk after trying Login upto 5 times.
This is to reduce the network chattiness, this also reduces
the overall time spent on `net.Dial`.
Fixes#3286
Ref #3229
After review with @abperiasamy we decided to remove all the unnecessary options
- MINIO_BROWSER (Implemented as a security feature but now deemed obsolete
since even if blocking access to MINIO_BROWSER, s3 API port is open)
- MINIO_CACHE_EXPIRY (Defaults to 72h)
- MINIO_MAXCONN (No one used this option and we don't test this)
- MINIO_ENABLE_FSMETA (Enable FSMETA all the time)
Remove --ignore-disks option - this option was implemented when XL layer
would initialize the backend disks and heal them automatically to disallow
XL accidentally using the root partition itself this option was introduced.
This behavior has been changed XL no longer automatically initializes
`format.json` a HEAL is controlled activity, so ignore-disks is not
useful anymore. This change also addresses the problems of our documentation
going forward and keeps things simple. This patch brings in reduction of
options and defaulting them to a valid known inputs. This patch also
serves as a guideline of limiting many ways to do the same thing.
In a distributed setup that the server should not perform any operation
on the storage layer after it is exported via RPC. e.g, cleaning up of
temporary directories under .minio.sys/tmp may interfere with ongoing
PUT objects being served by the distributed setup.
These messages based on our prep stage during XL
and prints more informative message regarding
drive information.
This change also does a much needed refactoring.
* Return negative values of Total and Free in StorageInfo() when we fail to get disk info
* Return consistent messages in web handlers when the server is not initialized
- Instrumentation for locks.
- Detailed test coverage.
- Adding RPC control handler to fetch lock instrumentation.
- RPC control handlers suite tests with a test RPC server.
This change initializes rpc servers associated with disks that are
local. It makes object layer initialization on demand, namely on the
first request to the object layer.
Also adds lock RPC service vendorized minio/dsync
Currently `xl.json` saves algorithm information for bit-rot
verification. Since the bit-rot algo's can change in the
future make sure the erasureReadFile doesn't default to
a particular algo. Instead use the checkSumInfo.
Fresh disks can be provided in any order, we need to make sure
to preserve existing disk order and populate the fresh disks
in new positions.
Thanks for Anis Elleuch <vadmeste@gmail.com> for finding this issue.
By default server heals/creates missing directories and re-populates
`format.json`, in some scenarios when disk is down for maintainenance
it would be beneficial for users to ignore such disks rather than
mistakenly using `root` partition.
Fixes#2128
The object cache implementation is XL cache, which defaults
to 8GB worth of read cache. Currently GetObject() transparently
writes to this cache upon first client read and then subsequently
serves reads from the same cache.
Currently expiration is not implemented.
This change co-incides with another patch set which
reduces the writeQuorum requirement. With the
write quorum change it is now possible to support
6 disk configuration.
This patch also supports writing to a temporary file and renaming
rather than appending to an existing file. This helps in avoiding
inconsistent files.
Previously write quorum was set to (no. of disk / 2) + 3. As per new
change, the write quorum is set to (no. of disk / 2) + 2. This helps
to accommodate one more failure of disk.