minio

Commit Graph

Author	SHA1	Message	Date
Harshavardhana	2dc46cb153	Report correct error when O_DIRECT is not supported (#9545 ) fixes #9537	5 years ago
Harshavardhana	fea4a1e68e	fix logical error in path length handling for windows (#9520 ) fixes #9515	5 years ago
Anis Elleuch	3e063cca5c	Show the cause error in startup when directio is not supported (#9497 ) This commit tries to create a file using direct i/o in the startup so the server returns quickly and avoid cryptic other errors.	5 years ago
Harshavardhana	ab77b216d1	fix: remove restrictions on windows for NAME_MAX (#9469 ) Fixes #9393	5 years ago
Harshavardhana	498389123e	avoid unnecessary logging on fresh/newly replaced drives (#9470 ) data usage tracker and crawler seem to be logging non-actionable information on console, which is not useful and is fixed on its own in almost all deployments, lets keep this logging to minimal.	5 years ago
Anis Elleuch	c434dff0a4	posix: Add missing error return in RenameFile() (#9319 ) Although it should not happen in most cases.	5 years ago
Harshavardhana	f44cfb2863	use GlobalContext whenever possible (#9280 ) This change is throughout the codebase to ensure that all codepaths honor GlobalContext	5 years ago
Harshavardhana	e20e08d700	fix: remove the sleep from listing operations (#9287 ) make rest of the Walk() function more predictable, it was observed that in nominal deployments even without much workload the drives are generally slow for respond for readdir operations, for the sleepDuration factor of 10 this can cause unexpected slowness in the Listing calls, while it is good for all other I/O, it may simply slow down Listing immensely which is not useful. fixes #9261	5 years ago
Anis Elleuch	e51e465543	delete: Use physical Dir() for proper prefix cleanup in Windows (#9297 ) In FS mode under Windows, removing an object will not automatically. remove parent empty prefixes. The reason is that path.Dir() was used, however filepath.Dir() is more appropriate since filepath is physical (meaning it operates on OS filesystem paths) This is not caught because failure for Windows CI is not caught.	5 years ago
Bala FA	2c3e34f001	add force delete option of non-empty bucket (#9166 ) passing HTTP header `x-minio-force-delete: true` would allow standard S3 API DeleteBucket to delete a non-empty bucket forcefully.	5 years ago
Harshavardhana	6f992134a2	fix: startup load time by reusing storageDisks (#9210 )	5 years ago
Krishna Srinivas	45b1c66195	fix: implement splunk specific listObjects when delimiter=guidSplunk (#9186 )	5 years ago
Harshavardhana	cfc9cfd84a	fix: various optimizations, idiomatic changes (#9179 ) - acquire since leader lock for all background operations - healing, crawling and applying lifecycle policies. - simplify lifecyle to avoid network calls, which was a bug in implementation - we should hold a leader and do everything from there, we have access to entire name space. - make listing, walking not interfere by slowing itself down like the crawler. - effectively use global context everywhere to ensure proper shutdown, in cache, lifecycle, healing - don't read `format.json` for prometheus metrics in StorageInfo() call.	5 years ago
Klaus Post	8d98662633	re-implement data usage crawler to be more efficient (#9075 ) Implementation overview: https://gist.github.com/klauspost/1801c858d5e0df391114436fdad6987b	5 years ago
Krishna Srinivas	2e9fed1a14	non-empty dirs should not be listed as objects (#9129 )	5 years ago
Kody A Kantor	06e30b5aa1	Skip building directio on platforms that don't support Direct IO (#9059 )	5 years ago
Anis Elleuch	0af62d35a0	xl: Implement posix.DeletePrefixes to enhance delete perf (#9100 ) Bulk delete API was using cleanupObjectsBulk() which calls posix listing and delete API to remove objects internal files in the backend (xl.json and parts) one by one. Add DeletePrefixes in the storage API to remove the content of a directory in a single call. Also use a remove goroutine for each disk to accelerate removal.	5 years ago
Harshavardhana	88ae0f1196	Improve delete performance by reducing the number of calls (#9092 ) - Remove the requirement to honor storage class for deletes - Improve `posix.DeleteFileBulk` code to Stat the volumeDir only once per call, rather than for all object paths.	5 years ago
Harshavardhana	23a8411732	Add a generic Walk()'er to list a bucket, optinally prefix (#9026 ) This generic Walk() is used by likes of Lifecyle, or KMS to rotate keys or any other functionality which relies on this functionality.	5 years ago
Anis Elleuch	d4dcf1d722	metrics: Use StorageInfo() instead to have consistent info (#9006 ) Metrics used to have its own code to calculate offline disks. StorageInfo() was avoided because it is an expensive operation by sending calls to all nodes. To make metrics & server info share the same code, a new argument `local` is added to StorageInfo() so it will only query local disks when needed. Metrics now calls StorageInfo() as server info handler does but with the local flag set to false. Co-authored-by: Praveen raj Mani <praveen@minio.io> Co-authored-by: Harshavardhana <harsha@minio.io>	5 years ago
Klaus Post	d0cea7adea	Fix stream read IO count (#8961 ) Streams are returning a readcloser and returning would decrement io count instantly, fix it. change maxActiveIOCount to 3, meaning it will pause crawling if 3 operations are running.	5 years ago
Harshavardhana	2d295a31de	Avoid select inside a recursive function to avoid CPU spikes (#8923 ) Additionally also allow configurable go-routines	5 years ago
Harshavardhana	f14f60a487	fix: Avoid double usage calculation on every restart (#8856 ) On every restart of the server, usage was being calculated which is not useful instead wait for sufficient time to start the crawling routine. This PR also avoids lots of double allocations through strings, optimizes usage of string builders and also avoids crawling through symbolic links. Fixes #8844	5 years ago
Harshavardhana	fc5213258e	posix: Do not take disk offline on I/O errors (#8836 ) Choosing maxAllowedIOError is arbitrary and prone to errors, when drives might be perfectly capable of taking I/O with only few locations return I/O error. This is a hindrance of sort where backend filesystems like ZFS can automatically fix and handle these scenarios. The added problem with current approach that we take the drive offline, making it virtually impossible to bring it online without restart the server which is not desirable on a busy cluster. Remove this state such that let the backend return error appropriately to caller and let the caller decide what to do with the error.	5 years ago
Anis Elleuch	c18fbdb29a	posix: Remove a non needed nil check in DiskInfo() (#8830 ) posix.DiskInfo() returns errFaultyDisk when posix is nil, but there is no way that this would happen any time, therefore removing un-needed code.	5 years ago
Harshavardhana	0879a4f743	rest/storage: Remove racy LastError usage (#8817 ) instead perform a liveness check call to verify if server is online and print relevant errors. Also introduce a StorageErr string error type instead of errors.New() deprecate usage of VerifyFileError, DeleteFileError for gob, change in datastructure also requires bump in storage REST version to v13. Fixes #8811	5 years ago
Klaus Post	37b32199e3	Validate XL sets on format (#8779 ) When formatting a set validate if a host failure will likely lead to data loss. While we don't know what config will be set in the future evaluate to our best knowledge, assuming default settings.	5 years ago
Harshavardhana	5aa5dcdc6d	lock: improve locker initialization at init (#8776 ) Use reference format to initialize lockers during startup, also handle `nil` for NetLocker in dsync and remove errorLocker implementation Add further tuning parameters such as - DialTimeout is now 15 seconds from 30 seconds - KeepAliveTimeout is not 20 seconds, 5 seconds more than default 15 seconds - ResponseHeaderTimeout to 10 seconds - ExpectContinueTimeout is reduced to 3 seconds - DualStack is enabled by default remove setting it to `true` - Reduce IdleConnTimeout to 30 seconds from 1 minute to avoid idleConn build up Fixes #8773	5 years ago
Harshavardhana	f68a7005c0	Improve disk formatting stage for large disk sets (#8690 )	5 years ago
Anis Elleuch	555969ee42	Add data usage collect with its new admin API (#8553 ) Admin data usage info API returns the following (Only FS & XL, for now) - Number of buckets - Number of objects - The total size of objects - Objects histogram - Bucket sizes	5 years ago
Nitish Tiwari	3df7285c3c	Add Support for Cache and S3 related metrics in Prometheus endpoint (#8591 ) This PR adds support below metrics - Cache Hit Count - Cache Miss Count - Data served from Cache (in Bytes) - Bytes received from AWS S3 - Bytes sent to AWS S3 - Number of requests sent to AWS S3 Fixes #8549	5 years ago
Harshavardhana	2ab8d5e47f	Enable build verification with race (#8583 )	5 years ago
Klaus Post	c7844fb1fb	posix: cache disk ID for a short while (#8564 ) `posix.getDiskID()` takes up to 30% of all CPU due to the `os.Stat` call on `GET` calls. Before: ``` Operation: GET - Concurrency: 12 Average: 1333.97 MB/s, 1365.99 obj/s, 1365.98 ops ended/s (4m59.975s) * First Byte: Average: 7.801487ms, Median: 7.9974ms, Best: 1.9822ms, Worst: 110.0021ms Aggregated, split into 299 x 1s time segments: * Fastest: 1453.50 MB/s, 1488.38 obj/s, 1492.00 ops ended/s (1s) * 50% Median: 1360.47 MB/s, 1393.12 obj/s, 1393.00 ops ended/s (1s) * Slowest: 978.68 MB/s, 1002.17 obj/s, 1004.00 ops ended/s (1s) ``` After: ``` Operation: GET - Concurrency: 12 * Average: 1706.07 MB/s, 1747.02 obj/s, 1747.01 ops ended/s (4m59.985s) * First Byte: Average: 5.797886ms, Median: 5.9959ms, Best: 996.3µs, Worst: 84.0007ms Aggregated, split into 299 x 1s time segments: * Fastest: 1830.03 MB/s, 1873.96 obj/s, 1872.00 ops ended/s (1s) * 50% Median: 1735.04 MB/s, 1776.68 obj/s, 1776.00 ops ended/s (1s) * Slowest: 994.94 MB/s, 1018.82 obj/s, 1018.00 ops ended/s (1s) ``` TLDR; `os.Stat` is not free.	5 years ago
Klaus Post	890b493a2e	Use random file name for write check (#8563 ) Since there may be multiple writes going on concurrently Use a random file name for the write check to avoid collisions.	5 years ago
Klaus Post	1dd38750f7	Remove read-ahead for small files (#8522 ) We should only read ahead if we are reading big files. We enable it for files >= 16MB. Benchmark on 64KB objects. Before: ``` Operation: GET Errors: 0 Average: 59.976s, 87.13 MB/s, 1394.07 ops ended/s. Fastest: 1s, 90.99 MB/s, 1455.00 ops ended/s. 50% Median: 1s, 87.53 MB/s, 1401.00 ops ended/s. Slowest: 1s, 81.39 MB/s, 1301.00 ops ended/s. ``` After: ``` Operation: GET Errors: 0 Average: 59.992s, 207.99 MB/s, 3327.85 ops ended/s. Fastest: 1s, 219.20 MB/s, 3507.00 ops ended/s. 50% Median: 1s, 210.54 MB/s, 3368.00 ops ended/s. Slowest: 1s, 179.14 MB/s, 2865.00 ops ended/s. ``` The 64KB buffer is actually a small disadvantage for this case, but I believe it will be better in general than no buffer.	5 years ago
Praveen raj Mani	fa325665b1	Do not append the endpoint for fs/xl disks in StorageInfo (#8472 )	5 years ago
Krishna Srinivas	980bf78b4d	Detect underlying disk mount/unmount (#8408 )	5 years ago
Praveen raj Mani	8836d57e3c	The prometheus metrics refractoring (#8003 ) The measures are consolidated to the following metrics - `disk_storage_used` : Disk space used by the disk. - `disk_storage_available`: Available disk space left on the disk. - `disk_storage_total`: Total disk space on the disk. - `disks_offline`: Total number of offline disks in current MinIO instance. - `disks_total`: Total number of disks in current MinIO instance. - `s3_requests_total`: Total number of s3 requests in current MinIO instance. - `s3_errors_total`: Total number of errors in s3 requests in current MinIO instance. - `s3_requests_current`: Total number of active s3 requests in current MinIO instance. - `internode_rx_bytes_total`: Total number of internode bytes received by current MinIO server instance. - `internode_tx_bytes_total`: Total number of bytes sent to the other nodes by current MinIO server instance. - `s3_rx_bytes_total`: Total number of s3 bytes received by current MinIO server instance. - `s3_tx_bytes_total`: Total number of s3 bytes sent by current MinIO server instance. - `minio_version_info`: Current MinIO version with commit-id. - `s3_ttfb_seconds_bucket`: Histogram that holds the latency information of the requests. And this PR also modifies the current StorageInfo queries - Decouples StorageInfo from ServerInfo . - StorageInfo is enhanced to give endpoint information. NOTE: ADMIN API VERSION IS BUMPED UP IN THIS PR Fixes #7873	5 years ago
Harshavardhana	ff5bf51952	admin/heal: Fix deep healing to heal objects under more conditions (#8321 ) - Heal if the part.1 is truncated from its original size - Heal if the part.1 fails while being verified in between - Heal if the part.1 fails while being at a certain offset Other cleanups include make sure to flush the HTTP responses properly from storage-rest-server, avoid using 'defer' to improve call latency. 'defer' incurs latency avoid them in our hot-paths such as storage-rest handlers. Fixes #8319	5 years ago
Harshavardhana	975134e42b	Add checks in DiskInfo() to protect against changing mounts (#8286 )	5 years ago
Anis Elleuch	3f258062d8	bitrot: Verify file size inside storage interface (#7932 )	5 years ago
Harshavardhana	a7be313230	Start using new errors package (#8207 )	5 years ago
Krishna Srinivas	c38ada1a26	write() to disk in 4MB blocks for better performance (#7888 )	5 years ago
Harshavardhana	e6d8e272ce	Use const slashSeparator instead of "/" everywhere (#8028 )	5 years ago
Harshavardhana	e40c29e834	Fail appropriately if the disk has I/O errors (#7972 ) If the disk has I/O errors, we should simply ignore such a disk and not be bothered about it - until it is replaced.	5 years ago
Anis Elleuch	000a60f238	xl: Heal empty parts (#7860 ) posix.VerifyFile() doesn't know how to check if a file is corrupted if that file is empty. We do have the part size in xl.json so we pass it to VerifyFile to return an error so healing empty parts can work properly.	5 years ago
Krishna Srinivas	58d90ed73c	Avoid network transfer for bitrot verification during healing (#7375 )	5 years ago
Harshavardhana	39b3e4f9b3	Avoid using io.ReadFull() for WriteAll and CreateFile (#7676 ) With these changes we are now able to peak performances for all Write() operations across disks HDD and NVMe. Also adds readahead for disk reads, which also increases performance for reads by 3x.	6 years ago
Krishnan Parthasarathi	c871456269	File must be sync'd before closing (#7657 ) - group sync and close action into a single defer statement to avoid evaluation order related bugs in future.	6 years ago
Harshavardhana	b3f22eac56	Offload listing to posix layer (#7611 ) This PR adds one API WalkCh which sorts and sends list over the network Each disk walks independently in a sorted manner.	6 years ago

1 2 3

141 Commits (5bf3eeaa77224c2c2771f28afaa1784cb6afb377)