Currently, listing directories on HDFS incurs a per-entry remote Stat() call
penalty, the cost of which can really blow up on directories with many
entries (+1,000) especially when considered in addition to peripheral
calls (such as validation) and the fact that minio is an intermediary to the
client (whereas other clients listed below can query HDFS directly).
Because listing directories this way is expensive, the Golang HDFS library
provides the [`Client.Open()`] function which creates a [`FileReader`] that is
able to batch multiple calls together through the [`Readdir()`] function.
This is substantially more efficient for very large directories.
In one case we were witnessing about +20 seconds to list a directory with 1,500
entries, admittedly large, but the Java hdfs ls utility as well as the HDFS
library sample ls utility were much faster.
Hadoop HDFS DFS (4.02s):
λ ~/code/minio → use-readdir
» time hdfs dfs -ls /directory/with/1500/entries/
…
hdfs dfs -ls 5.81s user 0.49s system 156% cpu 4.020 total
Golang HDFS library (0.47s):
λ ~/code/hdfs → master
» time ./hdfs ls -lh /directory/with/1500/entries/
…
./hdfs ls -lh 0.13s user 0.14s system 56% cpu 0.478 total
mc and minio **without** optimization (16.96s):
λ ~/code/minio → master
» time mc ls myhdfs/directory/with/1500/entries/
…
./mc ls 0.22s user 0.29s system 3% cpu 16.968 total
mc and minio **with** optimization (0.40s):
λ ~/code/minio → use-readdir
» time mc ls myhdfs/directory/with/1500/entries/
…
./mc ls 0.13s user 0.28s system 102% cpu 0.403 total
[`Client.Open()`]: https://godoc.org/github.com/colinmarc/hdfs#Client.Open
[`FileReader`]: https://godoc.org/github.com/colinmarc/hdfs#FileReader
[`Readdir()`]: https://godoc.org/github.com/colinmarc/hdfs#FileReader.Readdir
for users who don't have access to HDFS rootPath '/'
can optionally specify `minio gateway hdfs hdfs://namenode:8200/path`
for which they have access to, allowing all writes to be
performed at `/path`.
NOTE: once configured in this manner you need to make
sure command line is correctly specified, otherwise
your data might not be visible
closes#10011
- admin info node offline check is now quicker
- admin info now doesn't duplicate the code
across doing the same checks for disks
- rely on StorageInfo to return appropriate errors
instead of calling locally.
- diskID checks now return proper errors when
disk not found v/s format.json missing.
- add more disk states for more clarity on the
underlying disk errors.
Without instantiating a new rest client we can
have a recursive error which can lead to
healthcheck returning always offline, this can
prematurely take the servers offline.
- Implement a new xl.json 2.0.0 format to support,
this moves the entire marshaling logic to POSIX
layer, top layer always consumes a common FileInfo
construct which simplifies the metadata reads.
- Implement list object versions
- Migrate to siphash from crchash for new deployments
for object placements.
Fixes#2111
Advantages avoids 100's of stats which are needed for each
upload operation in FS/NAS gateway mode when uploading a large
multipart object, dramatically increases performance for
multipart uploads by avoiding recursive calls.
For other gateway's simplifies the approach since
azure, gcs, hdfs gateway's don't capture any specific
metadata during upload which needs handler validation
for encryption/compression.
Erasure coding was already optimized, additionally
just avoids small allocations of large data structure.
Fixes#7206
enable linter using golangci-lint across
codebase to run a bunch of linters together,
we shall enable new linters as we fix more
things the codebase.
This PR fixes the first stage of this
cleanup.
This PR is to ensure that we call the relevant object
layer APIs for necessary S3 API level functionalities
allowing gateway implementations to return proper
errors as NotImplemented{}
This allows for all our tests in mint to behave
appropriately and can be handled appropriately as
well.
S3 is now natively supported by B2 cloud storage provider
there is no reason to use specialized gateway for B2 anymore,
our current S3 gateway with caching would work with B2.
Resolves#8584
OSS go sdk lacks licensing terms in their
repository, and there has been no activity
On the issue here https://github.com/aliyun/aliyun-oss-go-sdk/issues/245
This PR is to ensure we remove any dependency code which
lacks explicit license file in their repo.
if needed use --no-compat to disable md5sum while
verifying any performance numbers.
bring back --compat behavior as default to avoid
additional documentation and confusing behavior,
as we are working towards improving md5sum to
be faster on AVX instructions, enabling this
should be hardly a problem in future versions
of MinIO.
fixes#8012fixes#7859fixes#7642
- B2 does actually return an MD5 hash for newly uploaded objects
so we can use it to provide better compatibility with S3 client
libraries that assume the ETag is the MD5 hash such as boto.
- depends on change in blazer library.
- new behaviour is only enabled if MinIO's --compat mode is active.
- behaviour for multipart uploads is unchanged (works fine as is).
- Add conservative timeouts upto 3 minutes
for internode communication
- Add aggressive timeouts of 30 seconds
for gateway communication
Fixes#9105Fixes#8732Fixes#8881Fixes#8376Fixes#9028
This is a precursor change before versioning,
removes/deprecates the requirement of remembering
partName and partETag which are not useful after
a multipart transaction has finished.
This PR reduces the overall size of the backend
JSON for large file uploads.
To allow better control the cache eviction process.
Introduce MINIO_CACHE_WATERMARK_LOW and
MINIO_CACHE_WATERMARK_HIGH env. variables to specify
when to stop/start cache eviction process.
Deprecate MINIO_CACHE_EXPIRY environment variable. Cache
gc sweeps at 30 minute intervals whenever high watermark is
reached to clear least recently accessed entries in the cache
until sufficient space is cleared to reach the low watermark.
Garbage collection uses an adaptive file scoring approach based
on last access time, with greater weights assigned to larger
objects and those with more hits to find the candidates for eviction.
Thanks to @klauspost for this file scoring algorithm
Co-authored-by: Klaus Post <klauspost@minio.io>
Metrics used to have its own code to calculate offline disks.
StorageInfo() was avoided because it is an expensive operation
by sending calls to all nodes.
To make metrics & server info share the same code, a new
argument `local` is added to StorageInfo() so it will only
query local disks when needed.
Metrics now calls StorageInfo() as server info handler does
but with the local flag set to false.
Co-authored-by: Praveen raj Mani <praveen@minio.io>
Co-authored-by: Harshavardhana <harsha@minio.io>
We added support for caching and S3 related metrics in #8591. As
a continuation, it would be helpful to add support for Azure & GCS
gateway related metrics as well.