xl.storageDisks is sometimes passed to some low-level XL functions. Some disks in
xl.storageDisks are set to nil when they encounter some errors. This means all
elements in xl.storageDisks will be nil after some time which lead to an unusable XL.
This is an enhancement to the XL/distributed-XL mode. FS mode is
unaffected.
The ReadFileWithVerify storage-layer call is similar to ReadFile with
the additional functionality of performing bit-rot checking. It
accepts additional parameters for a hashing algorithm to use and the
expected hex-encoded hash string.
This patch provides significant performance improvement because:
1. combines the step of reading the file (during
erasure-decoding/reconstruction) with bit-rot verification;
2. limits the number of file-reads; and
3. avoids transferring the file over the network for bit-rot
verification.
ReadFile API is implemented as ReadFileWithVerify with empty hashing
arguments.
Credits to AB and Harsha for the algorithmic improvement.
Fixes#4236.
This PR also does backend format change to 1.0.1
from 1.0.0. Backward compatible changes are still
kept to read the 'md5Sum' key. But all new objects
will be stored with the same details under 'etag'.
Fixes#4312
Such that in a situation where all errors were
ignored we need to reduce the errors using
readQuorum to get a consistent error value.
Without this change errors generated will
never be consistent with for an expected scenario.
For example in a 6 disk setup 1 disk is missing
and 5 do not have the volume (testbucket)
Without this change Stat() would result in different
errors depending on which disk died. Can cause
confusion to S3 client application.
This change addresses need to track type of
errors we ignored and bring readQuorum to
choose the maximally occuring as the value
of truth.
Ignore a disk which wasn't able to successfully perform an action to
avoid eventual perturbations when the disk comes back in the middle
of write change.
Current implementation didn't honor quorum properly and didn't
handle the errors generated properly. This patch addresses that
and also moves common code `cleanupMultipartUploads` into xl
specific private function.
Fixes#3665
This is a consolidation effort, avoiding usage
of naked strings in codebase. Whenever possible
use constants which can be repurposed elsewhere.
This also fixes `goconst ./...` reported issues.
This is written so that to simplify our handler code
and provide a way to only update metadata instead of
the data when source and destination in CopyObject
request are same.
Fixes#3316
This is to utilize an optimized version of
sha256 checksum which @fwessels implemented.
blake2b lacks such optimizations on ARM platform,
this can provide us significant boost in performance.
blake2b on ARM64 as expected would be slower.
```
BenchmarkSize1K-4 30000 44015 ns/op 23.26 MB/s
BenchmarkSize8K-4 5000 335448 ns/op 24.42 MB/s
BenchmarkSize32K-4 1000 1333960 ns/op 24.56 MB/s
BenchmarkSize128K-4 300 5328286 ns/op 24.60 MB/s
```
sha256 on ARM64 is faster by orders of magnitude giving close to
AVX performance of blake2b.
```
BenchmarkHash8Bytes-4 1000000 1446 ns/op 5.53 MB/s
BenchmarkHash1K-4 500000 3229 ns/op 317.12 MB/s
BenchmarkHash8K-4 100000 14430 ns/op 567.69 MB/s
BenchmarkHash1M-4 1000 1640126 ns/op 639.33 MB/s
```
This is needed as explained by @krisis
Lets say we have following errors.
```
[]error{nil, errFileNotFound, errDiskAccessDenied, errDiskAccesDenied}
```
Since the last two errors are filtered, the maximum is nil,
depending on map order.
Let's say we get nil from reduceErr. Clearly at this point
we don't have quorum nodes agreeing about the data and since
GetObject only requires N/2 (Read quorum) and isDiskQuorum
would have returned true. This is problematic and can lead to
undersiable consequences.
Fixes#3298
Disks when are offline for a long period of time, we should
ignore the disk after trying Login upto 5 times.
This is to reduce the network chattiness, this also reduces
the overall time spent on `net.Dial`.
Fixes#3286
In a multipart upload scenario disks going down and coming backup
can lead to certain parts missing on the disk/server which was
going down. This is a valid case since these blocks can be
missing and should be healed through heal operation. But we are
not supposed to fail prematurely since we have enough data on
the other disks as well within read-quorum.
This fix relaxes previous assumption, fixes a major corruption
issue reproduced by @vadmeste.
Fixes#2976
- Using gjson for constructing xlMetaV1{} in realXLMeta.
- Test for parsing constructing xlMetaV1{} using gjson.
- Changes made since benchmarks showed 30-40% improvement in speed.
- Follow up comments in issue https://github.com/minio/minio/issues/2208
for more details.
- gjson parsing of parts from xl.json for listParts.
- gjson parsing of statInfo from xl.json for getObjectInfo.
- Vendorizing gjson dependency.
Currently `xl.json` saves algorithm information for bit-rot
verification. Since the bit-rot algo's can change in the
future make sure the erasureReadFile doesn't default to
a particular algo. Instead use the checkSumInfo.
The reason is any function relying on `getLoadBalancedQuorumDisks`
cannot possibly have an idempotent behavior.
The problem comes from given a set of N disks returning just a
shuffled N/2 disks. In case of a scenario where we have N/2
number of failed disks, the returned value of `getLoadBalancedQuorumDisks`
is not equal to the same failed disks so essentially calls using such
disks might succeed or fail randomly at different intervals in time.
This proposal change is we move to `getLoadBalancedDisks()`
and use the shuffled N disks as a whole. Since most of the time we might
hit a good disk since we are not reducing our solution space. This
also provides consistent behavior for all the functions which rely
on shuffled disks.
Fixes#2242
* XL: Refactor of xl.PutObjectPart and erasureCreateFile.
* GetCheckSum and AddCheckSum methods for xlMetaV1
* Simple unit test case for erasureCreateFile()
This change is needed to make reading from objects future proof
in-terms of handling online disks. Our current counter is not
based on affirmative knowledge and relies on arithmetic sequence
which can lead to bugs.
Using modTime simplifies the understanding of `xl.json` and future
tooling / debugging of the format.
Each metadata ops have a list of errors which can be
ignored, this is essentially needed when
- disks are not found
- disks are found but cannot be accessed (permission denied)
- disks are there but fresh disks were added
This is needed since we don't have healing code in place where
it would have healed the fresh disks added.
Fixes#2072
AppendFile ensures that it appends the entire buffer. Returns
an error otherwise, this patch removes the necessity for the
caller to look for 'n' return on short writes.
Ref #1893