Other listing optimizations include
- remove double sorting while filtering object entries
- improve error message when upload-id is not in quorum
- use jsoniter for full unmarshal json, instead of gjson
- remove unused code
Also add a cross compile script to test always cross
compilation for some well known platforms and architectures
, we support out of box compilation of these platforms even
if we don't make an official release build.
This script is to avoid regressions in this area when we
add platform dependent code.
This PR adds pass-through, single encryption at gateway and double
encryption support (gateway encryption with pass through of SSE
headers to backend).
If KMS is set up (either with Vault as KMS or using
MINIO_SSE_MASTER_KEY),gateway will automatically perform
single encryption. If MINIO_GATEWAY_SSE is set up in addition to
Vault KMS, double encryption is performed.When neither KMS nor
MINIO_GATEWAY_SSE is set, do a pass through to backend.
When double encryption is specified, MINIO_GATEWAY_SSE can be set to
"C" for SSE-C encryption at gateway and backend, "S3" for SSE-S3
encryption at gateway/backend or both to support more than one option.
Fixes#6323, #6696
xl.json is the source of truth for all erasure
coded objects, without which we won't be able to
read the objects properly. This PR enables sync
mode for writing `xl.json` such all writes go hit
the disk and are persistent under situations such
as abrupt power failures on servers running Minio.
This commit will print connection failures to other disks in other nodes
after 5 retries. It is useful for users to understand why the
distribued cluster fails to boot up.
Continuing from PR 157ed65c35
Our posix.go implementation did not handle I/O errors
properly on the disks, this led to situations where
top-level callers such as ListObjects might return early
without even verifying all the available disks.
This commit tries to address this in Kubernetes, drbd/nbd based
persistent volumes which can disconnect under load and
result in the situations with disks return I/O errors.
This commit also simplifies listing operation, listing
never returns any error. We can avoid this since we pretty
much ignore most of the errors anyways. When objects are
accessed directly we return proper errors.
This PR simplifies the code to avoid tracking
any running usage events. This PR also brings
in an upper threshold of upto 1 minute suspend
the usage function after which the usage would
proceed without waiting any longer.
disk usage crawling is not needed when a tenant
is not sharing the same disk for multiple other
tenants. This PR adds an optimization when we
see a setup uses entire disk, we simply rely on
statvfs() to give us total usage.
This PR also additionally adds low priority
scheduling for usage check routine, such that
other go-routines blocked will be automatically
unblocked and prioritized before usage.
Added support for new RPC support using HTTP POST. RPC's
arguments and reply are Gob encoded and sent as HTTP
request/response body.
This patch also removes Go RPC based implementation.
Better support of HEAD and listing of zero sized objects with trailing
slash (a.k.a empty directory). For that, isLeafDir function is added
to indicate if the specified object is an empty directory or not. Each
backend (xl, fs) has the responsibility to store that information.
Currently, in both of XL & FS, an empty directory is represented by
an empty directory in the backend.
isLeafDir() checks if the given path is an empty directory or not,
since dir listing is costly if the latter contains too many objects,
readDirN() is added in this PR to list only N number of entries.
In isLeadDir(), we will only list one entry to check if a directory
is empty or not.
- getBucketLocation
- headBucket
- deleteBucket
Should return 404 or NoSuchBucket even for invalid bucket names, invalid
bucket names are only validated during MakeBucket operation
Also make sure to not modify the underlying errors from
layers, we should return the error as is and one object
layer should translate the errors.
Fixes#5797
- Data from disk was being read after bitrot verification to return
data for GetObject. Strictly speaking this does not guarantee bitrot
protection, as disks may return bad data even temporarily.
- This fix reads data from disk, verifies data for bitrot and then
returns data to the client directly.
Delete & Multi Delete API should not try to remove the directory content.
The only permitted case is with zero size object with a trailing slash
in its name.
Overwriting files is allowed, but since the introduction of
the object directory, we will aslo need to allow overwriting
an empty directory. Putting twice the same object directory
won't fail with 403 error anymore.
This PR implements an object layer which
combines input erasure sets of XL layers
into a unified namespace.
This object layer extends the existing
erasure coded implementation, it is assumed
in this design that providing > 16 disks is
a static configuration as well i.e if you started
the setup with 32 disks with 4 sets 8 disks per
pack then you would need to provide 4 sets always.
Some design details and restrictions:
- Objects are distributed using consistent ordering
to a unique erasure coded layer.
- Each pack has its own dsync so locks are synchronized
properly at pack (erasure layer).
- Each pack still has a maximum of 16 disks
requirement, you can start with multiple
such sets statically.
- Static sets set of disks and cannot be
changed, there is no elastic expansion allowed.
- Static sets set of disks and cannot be
changed, there is no elastic removal allowed.
- ListObjects() across sets can be noticeably
slower since List happens on all servers,
and is merged at this sets layer.
Fixes#5465Fixes#5464Fixes#5461Fixes#5460Fixes#5459Fixes#5458Fixes#5460Fixes#5488Fixes#5489Fixes#5497Fixes#5496
Under any concurrent removeObjects in progress
might have removed the parents of the same prefix
for which there is an ongoing putObject request.
An inconsistent situation may arise as explained
below even under sufficient locking.
PutObject is almost successful at the last stage when
a temporary file is renamed to its actual namespace
at `a/b/c/object1`. Concurrently a RemoveObject is
also in progress at the same prefix for an `a/b/c/object2`.
To create the object1 at location `a/b/c` PutObject has
to create all the parents recursively.
```
a/b/c - os.MkdirAll loops through has now created
'a/' and 'b/' about to create 'c/'
a/b/c/object2 - at this point 'c/' and 'object2'
are deleted about to delete b/
```
Now for os.MkdirAll loop the expected situation is
that top level parent 'a/b/' exists which it created
, such that it can create 'c/' - since removeObject
and putObject do not compete for lock due to holding
locks at different resources. removeObject proceeds
to delete parent 'b/' since 'c/' is not yet present,
once deleted 'os.MkdirAll' would receive an error as
syscall.ENOENT which would fail the putObject request.
This PR tries to address this issue by implementing
a safer/guarded approach where we would retry an operation
such as `os.MkdirAll` and `os.Rename` if both operations
observe syscall.ENOENT.
Fixes#5254
This change removes the ReadFileWithVerify function from the
StorageAPI. The ReadFile was basically a redirection to ReadFileWithVerify.
This change removes the redirection and moves the logic of
ReadFileWithVerify directly into ReadFile.
This removes a lot of unnecessary code in all StorageAPI implementations.
Fixes#4946
* review: fix doc and typos
This change provides new implementations of the XL backend operations:
- create file
- read file
- heal file
Further this change adds table based tests for all three operations.
This affects also the bitrot algorithm integration. Algorithms are now
integrated in an idiomatic way (like crypto.Hash).
Fixes#4696Fixes#4649Fixes#4359
Since go1.8 os.RemoveAll and os.MkdirAll both support long
path names i.e UNC path on windows. The code we are carrying
was directly borrowed from `pkg/os` package and doesn't need
to be in our repo anymore. As a side affect this also
addresses our codecoverage issue.
Refer #4658
This commit changes posix's deleteFile() to not upstream errors from
removing parent directories. This fixes a race condition.
The race condition occurs when multiple deleteFile()s are called on the
same parent directory, but different child files. Because deleteFile()
recursively removes parent directories if they are empty, but
deleteFile() errors if the selected deletePath does not exist, there was
an opportunity for a race condition. The two processes would remove the
child directories successfully, then depend on the parent directory
still existing. In some cases this is an invalid assumption, because
other processes can remove the parent directory beforehand. This commit
changes deleteFile() to not upstream an error if one occurs, because the
only required error should be from the immediate deletePath, not from a
parent path.
In the specific bug report, multiple CompleteMultipartUpload requests
would launch multiple deleteFile() requests. Because they chain up on
parent directories, ultimately at the end, there would be multiple
remove files for the ultimate parent directory,
.minio.sys/multipart/{bucket}. Because only one will succeed and one
will fail, an error would be upstreamed saying that the file does not
exist, and the CompleteMultipartUpload code interpreted this as
NoSuchKey, or that the object/part id doesn't exist. This was faulty
behavior and is now fixed.
The added test fails before this change and passes after this change.
Fixes: https://github.com/minio/minio/issues/4727